一、ElasticSearch概述

官网:https://www.elastic.co/cn/downloads/elasticsearch

Elaticsearch,简称为es,es是一个开源的高扩展分布式全文检索引擎,它可以近乎实时的存储检索数据;本身扩展性很好,可以扩展到上百台服务器,处理PB级别(大数据时代)的数据。es也使用java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单

据国际权威的数据库产品评测机构DB Engines的统计,在2016年1月,ElasticSearch已超过Solr等,成为排名第一的搜索引擎类应用

总结

1、es基本是开箱即用(解压就可以用!) ,非常简单。Solr安装略微复杂一丢丢!
2、Solr 利用Zookeeper进行分布式管理,而Elasticsearch自身带有分布式协调管理功能
3、Solr 支持更多格式的数据,比如JSON、XML、 CSV ,而Elasticsearch仅支持json文件格式
4、Solr 官方提供的功能更多,而Elasticsearch本身更注重于核心功能,高级功能多有第三方插件提供,例如图形化界面需要kibana友好支撑
5、Solr 查询快,但更新索引时慢(即插入删除慢) ,用于电商等查询多的应用;

  • ES建立索引快(即查询慢) ,即实时性查询快,用于facebook新浪等搜索。
  • Solr是传统搜索应用的有力解决方案,但Elasticsearch更适用于新兴的实时搜索应用。

6、Solr比较成熟,有一个更大,更成熟的用户、开发和贡献者社区,而Elasticsearch相对开发维护者较少,更新太快,学习使用成本较高。

二、ElasticSearch安装

Windows下安装

1、安装

下载地址:https://www.elastic.co/cn/downloads/

历史版本下载:https://www.elastic.co/cn/downloads/past-releases/

解压即可(尽量将ElasticSearch相关工具放在统一目录下)

2、熟悉目录

bin 启动文件目录config 配置文件目录    1og4j2 日志配置文件    jvm.options java 虚拟机相关的配置(默认启动占1g内存,内容不够需要自己调整)    elasticsearch.ym1 elasticsearch 的配置文件! 默认9200端口!跨域!1ib   相关jar包modules 功能模块目录plugins 插件目录    ik分词器

3、启动

bin目录下的elasticsearch.bat

访问地址: localhost:9200

{  "name" : "TIANYH",  "cluster_name" : "elasticsearch",  "cluster_uuid" : "IOHRCRK6TKibMGdNZq4YtA",  "version" : {    "number" : "7.6.1",    "build_flavor" : "default",    "build_type" : "zip",    "build_hash" : "aa751e09be0a5072e8570670309b1f12348f023b",    "build_date" : "2020-02-29T00:15:25.529771Z",    "build_snapshot" : false,    "lucene_version" : "8.4.0",    "minimum_wire_compatibility_version" : "6.8.0",    "minimum_index_compatibility_version" : "6.0.0-beta1"  },  "tagline" : "You Know, for Search"}

安装可视化界面

elasticsearch-head

使用前提:需要安装nodejs

1、下载地址

https://github.com/mobz/elasticsearch-head

2、安装

解压即可(尽量将ElasticSearch相关工具放在统一目录下)

3、启动

cd elasticsearch-head# 安装依赖npm install# 启动npm run start# # 访问http://localhost:9100/

开启跨域(在elasticsearch解压目录config下elasticsearch.yml中添加)

# 开启跨域http.cors.enabled: true# 所有人访问http.cors.allow-origin: "*"

重启elasticsearch

理解:

  • 如果你是初学者
    • 索引 可以看做 “数据库”
    • 类型 可以看做 “表”
    • 文档 可以看做 “库中的数据(表中的行)”
  • 这个head,我们只是把它当做可视化数据展示工具,之后所有的查询都在kibana中进行
    • 因为不支持json格式化,不方便

安装kibana

Kibana是一个针对ElasticSearch的开源分析及可视化平台,用来搜索、查看交互存储在Elasticsearch索引中的数据。使用Kibana ,可以通过各种图表进行高级数据分析及展示。Kibana让海量数据更容易理解。它操作简单,基于浏览器的用户界面可以快速创建仪表板( dashboard )实时显示Elasticsearch查询动态。设置Kibana非常简单。无需编码或者额外的基础架构,几分钟内就可以完成Kibana安装并启动Elasticsearch索引监测。

1、下载地址:

下载的版本需要与ElasticSearch版本对应

https://www.elastic.co/cn/downloads/

历史版本下载:https://www.elastic.co/cn/downloads/past-releases/

2、安装

解压即可(尽量将ElasticSearch相关工具放在统一目录下)

3、启动

bin目录下的kibanan.bat

访问地址: localhost:5601

4、kibana汉化

编辑器打开kibana解压目录/config/kibana.yml,添加

i18n.locale: "zh-CN"

重启kibana

了解ELK

  • ELK是

    Elasticsearch、Logstash、 Kibana三大开源框架首字母大写简称

    。市面上也被成为Elastic Stack。

    • 其中Elasticsearch是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框架。
      • 像类似百度、谷歌这种大数据全文搜索引擎的场景都可以使用Elasticsearch作为底层支持框架,可见Elasticsearch提供的搜索能力确实强大,市面上很多时候我们简称Elasticsearch为es。
    • Logstash是ELK的中央数据流引擎,用于从不同目标(文件/数据存储/MQ )收集的不同格式数据,经过过滤后支持输出到不同目的地(文件/MQ/redis/elasticsearch/kafka等)。
    • Kibana可以将elasticsearch的数据通过友好的页面展示出来 ,提供实时分析的功能。
  • 市面上很多开发只要提到ELK能够一致说出它是一个日志分析架构技术栈总称 ,但实际上ELK不仅仅适用于日志分析,它还可以支持其它任何数据分析和收集的场景,日志分析和收集只是更具有代表性。并非唯一性。

收集清洗数据(Logstash) ==> 搜索、存储(ElasticSearch) ==> 展示(Kibana)

三、ElasticSearch核心概念

概述

1、索引(ElasticSearch)

  • 包多个分片

2、字段类型(映射)

  • 字段类型映射(字段是整型,还是字符型…)

3、文档

4、分片(Lucene索引,倒排索引)

ElasticSearch是面向文档,关系行数据库和ElasticSearch客观对比!一切都是JSON!

Relational DBElasticSearch
数据库(database)索引(indices)
表(tables)types
行(rows)documents
字段(columns)fields

elasticsearch(集群)中可以包含多个索引(数据库) ,每个索引中可以包含多个类型(表) ,每个类型下又包含多个文档(行) ,每个文档中又包含多个字段(列)

物理设计:

elasticsearch在后台把每个索引划分成多个分片,每分分片可以在集群中的不同服务器间迁移

一个人就是一个集群! ,即启动的ElasticSearch服务,默认就是一个集群,且默认集群名为elasticsearch

逻辑设计:

一个索引类型中,包含多个文档,比如说文档1,文档2。当我们索引一篇文档时,可以通过这样的顺序找到它:索引 => 类型 => 文档ID ,通过这个组合我们就能索引到某个具体的文档。 注意:ID不必是整数,实际上它是个字符串。

文档(”行“)

之前说elasticsearch是面向文档的,那么就意味着索引和搜索数据的最小单位是文档,elasticsearch中,文档有几个重要属性:

  • 自我包含,一篇文档同时包含字段和对应的值,也就是同时包含key:value !
  • 可以是层次型的,一个文档中包含自文档,复杂的逻辑实体就是这么来的!
  • 灵活的结构,文档不依赖预先定义的模式,我们知道关系型数据库中,要提前定义字段才能使用,在elasticsearch中,对于字段是非常灵活的,有时候,我们可以忽略该字段,或者动态的添加一个新的字段。

尽管我们可以随意的新增或者忽略某个字段,但是,每个字段的类型非常重要,比如一个年龄字段类型,可以是字符串也可以是整形。因为elasticsearch会保存字段和类型之间的映射及其他的设置。这种映射具体到每个映射的每种类型,这也是为什么在elasticsearch中,类型有时候也称为映射类型。

类型(“表”)

类型是文档的逻辑容器,就像关系型数据库一样,表格是行的容器。类型中对于字段的定义称为映射,比如name映射为字符串类型。我们说文档是无模式的,它们不需要拥有映射中所定义的所有字段,比如新增一个字段,那么elasticsearch是怎么做的呢?

  • elasticsearch会自动的将新字段加入映射,但是这个字段的不确定它是什么类型,elasticsearch就开始猜,如果这个值是18,那么elasticsearch会认为它是整形。但是elasticsearch也可能猜不对,所以最安全的方式就是提前定义好所需要的映射,这点跟关系型数据库殊途同归了,先定义好字段,然后再使用,别整什么幺蛾子。

索引(“库”)

索引是映射类型的容器, elasticsearch中的索引是一个非常大的文档集合。 索引存储了映射类型的字段和其他设置。然后它们被存储到了各个分片上了。我们来研究下分片是如何工作的。

一个集群至少有一个节点,而一个节点就是一个elasricsearch进程,节点可以有多个索引默认的,如果你创建索引,那么索引将会有个5个分片(primary shard ,又称主分片)构成的,每一个主分片会有一个副本(replica shard,又称复制分片)

有3个节点的集群,可以看到主分片和对应的复制分片都不会在同一个节点内,这样有利于某个节点挂掉了,数据也不至于失。实际上,一个分片是一个Lucene索引(一个ElasticSearch索引包含多个Lucene索引一个包含倒排索引的文件目录,倒排索引的结构使得elasticsearch在不扫描全部文档的情况下,就能告诉你哪些文档包含特定的关键字。不过,等等,倒排索引是什么鬼?

倒排索引(Lucene索引底层)

简单说就是 按(文章关键字,对应的文档)形式建立索引,根据关键字就可直接查询对应的文档(含关键字的),无需查询每一个文档,如下图

四、IK分词器(elasticsearch插件)

IK分词器:中文分词器

分词:即把一段中文或者别的划分成一个个的关键字,我们在搜索时候会把自己的信息进行分词,会把数据库中或者索引库中的数据进行分词,然后进行一一个匹配操作,默认的中文分词是将每个字看成一个词不使用用IK分词器的情况下),比如“我爱狂神”会被分为”我”,”爱”,”狂”,”神” ,这显然是不符合要求的,所以我们需要安装中文分词器ik来解决这个问题。

IK提供了两个分词算法: ik_smartik_max_word ,其中ik_smart最少切分, ik_max_word最细粒度划分!

1、下载

版本要与ElasticSearch版本对应

下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases

2、安装

ik文件夹是自己创建的

加压即可(但是我们需要解压到ElasticSearch的plugins目录ik文件夹下)

4、使用 ElasticSearch安装补录/bin/elasticsearch-plugin 可以查看插件

E:\ElasticSearch\elasticsearch-7.6.1\bin>elasticsearch-plugin list

5、使用kibana测试

ik_smart:最少切分

GET _analyze{  "analyzer": "ik_smart",  "text": "白日依山尽黄河入海流"}{  "tokens" : [    {      "token" : "白日",      "start_offset" : 0,      "end_offset" : 2,      "type" : "CN_WORD",      "position" : 0    },    {      "token" : "依",      "start_offset" : 2,      "end_offset" : 3,      "type" : "CN_CHAR",      "position" : 1    },    {      "token" : "山",      "start_offset" : 3,      "end_offset" : 4,      "type" : "CN_CHAR",      "position" : 2    },    {      "token" : "尽",      "start_offset" : 4,      "end_offset" : 5,      "type" : "CN_CHAR",      "position" : 3    },    {      "token" : "黄河",      "start_offset" : 5,      "end_offset" : 7,      "type" : "CN_WORD",      "position" : 4    },    {      "token" : "入海流",      "start_offset" : 7,      "end_offset" : 10,      "type" : "CN_WORD",      "position" : 5    }  ]}

ik_max_word:最细粒度划分(穷尽词库的可能)

GET _analyze{  "analyzer": "ik_max_word",  "text": "白日依山尽黄河入海流"}{  "tokens" : [    {      "token" : "白日",      "start_offset" : 0,      "end_offset" : 2,      "type" : "CN_WORD",      "position" : 0    },    {      "token" : "依",      "start_offset" : 2,      "end_offset" : 3,      "type" : "CN_CHAR",      "position" : 1    },    {      "token" : "山",      "start_offset" : 3,      "end_offset" : 4,      "type" : "CN_CHAR",      "position" : 2    },    {      "token" : "尽",      "start_offset" : 4,      "end_offset" : 5,      "type" : "CN_CHAR",      "position" : 3    },    {      "token" : "黄河",      "start_offset" : 5,      "end_offset" : 7,      "type" : "CN_WORD",      "position" : 4    },    {      "token" : "入海流",      "start_offset" : 7,      "end_offset" : 10,      "type" : "CN_WORD",      "position" : 5    },    {      "token" : "入海",      "start_offset" : 7,      "end_offset" : 9,      "type" : "CN_WORD",      "position" : 6    },    {      "token" : "海流",      "start_offset" : 8,      "end_offset" : 10,      "type" : "CN_WORD",      "position" : 7    }  ]}

6、添加自定义的词添加到扩展字典中

elasticsearch目录/plugins/ik/config/IKAnalyzer.cfg.xml

打开 IKAnalyzer.cfg.xml 文件,扩展字典

IK Analyzer 扩展配置my.dic <!-- words_location --><!-- words_location -->

编写 my.dic

白日依山尽黄河入海流
GET _analyze{  "analyzer": "ik_smart",  "text": "白日依山尽黄河入海流"}{  "tokens" : [    {      "token" : "白日依山尽",      "start_offset" : 0,      "end_offset" : 5,      "type" : "CN_WORD",      "position" : 0    },    {      "token" : "黄河入海流",      "start_offset" : 5,      "end_offset" : 10,      "type" : "CN_WORD",      "position" : 1    }  ]}

五、Rest风格说明

一种软件架构风格,而不是标准,只是提供了一组设计原则和约束条件。它主要用于客户端和服务器交互类的软件。基于这个风格设计的软件可以更简洁更有层次更易于实现缓存等机制。

基本Rest命令说明:

methodurl地址描述
PUT(创建,修改)localhost:9200/索引名称/类型名称/文档id创建文档(指定文档id)
POST(创建)localhost:9200/索引名称/类型名称创建文档(随机文档id)
POST(修改)localhost:9200/索引名称/类型名称/文档id/_update修改文档
DELETE(删除)localhost:9200/索引名称/类型名称/文档id删除文档
GET(查询)localhost:9200/索引名称/类型名称/文档id查询文档通过文档ID
POST(查询)localhost:9200/索引名称/类型名称/文档id/_search查询所有数据

测试1、创建一个索引,添加

PUT /test/type/1{  "name": "测试",  "age": 18}{  "_index" : "test",  "_type" : "type",  "_id" : "1",  "_version" : 1,  "result" : "created",  "_shards" : {    "total" : 2,    "successful" : 1,    "failed" : 0  },  "_seq_no" : 0,  "_primary_term" : 1}

2、字段数据类型

  • 字符串类型

    • text、

      keyword

      • text:支持分词,全文检索,支持模糊、精确查询,不支持聚合,排序操作;text类型的最大支持的字符长度无限制,适合大字段存储;
      • keyword:不进行分词,直接索引、支持模糊、支持精确匹配,支持聚合、排序操作。keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度,超过给定长度后的数据将不被索引,无法通过term精确匹配检索返回结果。
  • 数值型

    • long、Integer、short、byte、double、float、half floatscaled float
  • 日期类型

    • date
  • te布尔类型

    • boolean
  • 二进制类型

    • binary
  • 等等…

3、指定字段的类型(使用PUT)

类似于建库(建立索引和字段对应类型),也可看做规则的建立

PUT /test2{  "mappings": {    "properties": {      "name": {        "type": "text"      },      "age":{        "type": "long"      },      "birthday":{        "type": "date"      }    }  }}{  "acknowledged" : true,  "shards_acknowledged" : true,  "index" : "test2"}

4、获取3建立的规则

GET test2{  "test2" : {    "aliases" : { },    "mappings" : {      "properties" : {        "age" : {          "type" : "long"        },        "birthday" : {          "type" : "date"        },        "name" : {          "type" : "text"        }      }    },    "settings" : {      "index" : {        "creation_date" : "1676438148562",        "number_of_shards" : "1",        "number_of_replicas" : "1",        "uuid" : "d-qUkOZKQJKzd68KHiN_pw",        "version" : {          "created" : "7060199"        },        "provided_name" : "test2"      }    }  }}

5、获取默认信息

_doc 默认类型(default type),type 在未来的版本中会逐渐弃用,因此产生一个默认类型进行代替

PUT /test3/_doc/1{  "name": "黄河",  "age": 18}{  "_index" : "test3",  "_type" : "_doc",  "_id" : "1",  "_version" : 1,  "result" : "created",  "_shards" : {    "total" : 2,    "successful" : 1,    "failed" : 0  },  "_seq_no" : 0,  "_primary_term" : 1}GET test3{  "test3" : {    "aliases" : { },    "mappings" : {      "properties" : {        "age" : {          "type" : "long"        },        "name" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        }      }    },    "settings" : {      "index" : {        "creation_date" : "1676438576004",        "number_of_shards" : "1",        "number_of_replicas" : "1",        "uuid" : "QmHErZuzSvmczgtgyzC7oA",        "version" : {          "created" : "7060199"        },        "provided_name" : "test3"      }    }  }}

如果自己的文档字段没有被指定,那么ElasticSearch就会给我们默认配置字段类型

扩展:通过GET _cat/ 可以获取ElasticSearch的当前的很多信息!

=^.^=/_cat/allocation/_cat/shards/_cat/shards/{index}/_cat/master/_cat/nodes/_cat/tasks/_cat/indices/_cat/indices/{index}/_cat/segments/_cat/segments/{index}/_cat/count/_cat/count/{index}/_cat/recovery/_cat/recovery/{index}/_cat/health/_cat/pending_tasks/_cat/aliases/_cat/aliases/{alias}/_cat/thread_pool/_cat/thread_pool/{thread_pools}/_cat/plugins/_cat/fielddata/_cat/fielddata/{fields}/_cat/nodeattrs/_cat/repositories/_cat/snapshots/{repository}/_cat/templates

6、修改

两种方案

①旧的(使用put覆盖原来的值)

  • 版本+1(_version)
  • 但是如果漏掉某个字段没有写,那么更新是没有写的字段 ,会消失
PUT /test/type/1{  "name": "测试",  "age": 19}GET /test/_doc/1{  "_index" : "test",  "_type" : "_doc",  "_id" : "1",  "_version" : 2,  "_seq_no" : 1,  "_primary_term" : 1,  "found" : true,  "_source" : {    "name" : "测试",    "age" : 19  }}PUT /test/type/1{  "age": 20}GET /test/_doc/1{  "_index" : "test",  "_type" : "_doc",  "_id" : "1",  "_version" : 3,  "_seq_no" : 2,  "_primary_term" : 1,  "found" : true,  "_source" : {    "age" : 20  }}

②新的(使用post的update)

  • version不会改变
  • 需要注意doc
  • 不会丢失字段
POST /test/_doc/1/_update{  "doc":{    "age":11  }}GET /test/_doc/1{  "_index" : "test",  "_type" : "_doc",  "_id" : "1",  "_version" : 5,  "_seq_no" : 4,  "_primary_term" : 1,  "found" : true,  "_source" : {    "name" : "测试",    "age" : 11  }}

7、删除

DELETE /test{  "acknowledged" : true}

8、查询(简单条件)

GET /test/_doc/_search?q=age:19{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 1.0,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "1",        "_score" : 1.0,        "_source" : {          "name" : "测试",          "age" : 19        }      }    ]  }}

9、复杂查询①查询匹配

  • match:匹配(会使用分词器解析(先分析文档,然后进行查询))
  • _source:过滤字段
  • sort:排序
  • formsize 分页
GET /test/_doc/_search{  }{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 5,      "relation" : "eq"    },    "max_score" : 1.0,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "1",        "_score" : 1.0,        "_source" : {          "name" : "测试",          "age" : 19        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "2",        "_score" : 1.0,        "_source" : {          "name" : "小李",          "age" : 19        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "3",        "_score" : 1.0,        "_source" : {          "name" : "小张",          "age" : 18        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : 1.0,        "_source" : {          "name" : "小明",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "5",        "_score" : 1.0,        "_source" : {          "name" : "明明",          "age" : 16        }      }    ]  }}
GET /test/_doc/_search{  "query":{    "match":{      "name":"明"    }  },  "_source":["age","name"],  "sort":[{"age":{"order":"asc"}}],  "from":0,  "size":20}{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : null,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : null,        "_source" : {          "name" : "小明",          "age" : 16        },        "sort" : [          16        ]      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "5",        "_score" : null,        "_source" : {          "name" : "明明",          "age" : 16        },        "sort" : [          16        ]      }    ]  }}

②多条件查询(bool)

  • must 相当于 and
  • should 相当于 or
  • must_not 相当于 not (... and ...)
  • filter 过滤
GET /test/_doc/_search{  "query":{    "bool":{      "must":[{"match":{"age":16}},{"match":{"name":"小"}}],      "filter":{        "range":{        "age":{          "gte":15,          "lte":17          }        }      }    }  } }{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 4,      "relation" : "eq"    },    "max_score" : 1.2940125,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : 1.2940125,        "_source" : {          "name" : "小明",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "6",        "_score" : 1.2940125,        "_source" : {          "name" : "小黄",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "7",        "_score" : 1.2940125,        "_source" : {          "name" : "小黑",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "9",        "_score" : 1.2940125,        "_source" : {          "name" : "小花",          "age" : 16        }      }    ]  }}

③匹配数组

  • 貌似不能与其它字段一起使用
  • 可以多关键字查(空格隔开)— 匹配字段也是符合的
  • match 会使用分词器解析(先分析文档,然后进行查询)
  • 搜词
GET /test/_doc/_search{  "query":{    "match":{      "name":"明 黑"    }  }}{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 3,      "relation" : "eq"    },    "max_score" : 1.9388659,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "7",        "_score" : 1.9388659,        "_source" : {          "name" : "小黑",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "5",        "_score" : 1.4651942,        "_source" : {          "name" : "明明",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : 1.0729234,        "_source" : {          "name" : "小明",          "age" : 16        }      }    ]  }}

④精确查询

  • term 直接通过 倒排索引 指定词条查询
  • 适合查询 number、date、keyword ,不适合text
GET /test/_doc/_search{  "query":{    "term":{      "age":16    }  }}{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 5,      "relation" : "eq"    },    "max_score" : 1.0,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : 1.0,        "_source" : {          "name" : "小明",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "5",        "_score" : 1.0,        "_source" : {          "name" : "明明",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "6",        "_score" : 1.0,        "_source" : {          "name" : "小黄",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "7",        "_score" : 1.0,        "_source" : {          "name" : "小黑",          "age" : 16        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "9",        "_score" : 1.0,        "_source" : {          "name" : "小花",          "age" : 16        }      }    ]  }}

⑤text和keyword

  • text:
    • 支持分词全文检索、支持模糊、精确查询,不支持聚合,排序操作;
    • text类型的最大支持的字符长度无限制,适合大字段存储;
  • keyword:
    • 不进行分词直接索引、支持模糊、支持精确匹配,支持聚合、排序操作。
    • keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度,超过给定长度后的数据将不被索引,无法通过term精确匹配检索返回结果
// 设置索引类型PUT /test2{  "mappings": {    "properties": {      "text":{        "type":"text"      },      "keyword":{        "type":"keyword"      }    }  }}// 设置字段数据PUT /test2/_doc/1{  "text":"测试keyword和text是否支持分词",  "keyword":"测试keyword和text是否支持分词"}GET /test2/_doc/_search{  "query":{   "match":{      "text":"测试"   }  }}{  "took" : 426,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 0.5753642,    "hits" : [      {        "_index" : "test2",        "_type" : "_doc",        "_id" : "1",        "_score" : 0.5753642,        "_source" : {          "text" : "测试keyword和text是否支持分词",          "keyword" : "测试keyword和text是否支持分词"        }      }    ]  }}GET /test2/_doc/_search{  "query":{   "match":{      "keyword":"测试"   }  }}{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 0,      "relation" : "eq"    },    "max_score" : null,    "hits" : [ ]  }}GET _analyze{  "analyzer": "keyword",  "text": ["白日依山尽"]}{  "tokens" : [    {      "token" : "白日依山尽",      "start_offset" : 0,      "end_offset" : 5,      "type" : "word",      "position" : 0    }  ]}GET _analyze{  "analyzer": "standard",    "text": ["白日依山尽"]}{  "tokens" : [    {      "token" : "白",      "start_offset" : 0,      "end_offset" : 1,      "type" : "",      "position" : 0    },    {      "token" : "日",      "start_offset" : 1,      "end_offset" : 2,      "type" : "",      "position" : 1    },    {      "token" : "依",      "start_offset" : 2,      "end_offset" : 3,      "type" : "",      "position" : 2    },    {      "token" : "山",      "start_offset" : 3,      "end_offset" : 4,      "type" : "",      "position" : 3    },    {      "token" : "尽",      "start_offset" : 4,      "end_offset" : 5,      "type" : "",      "position" : 4    }  ]}GET _analyze{  "analyzer": "ik_max_word",    "text": ["白日依山尽"]}{  "tokens" : [    {      "token" : "白日依山尽",      "start_offset" : 0,      "end_offset" : 5,      "type" : "CN_WORD",      "position" : 0    },    {      "token" : "白日",      "start_offset" : 0,      "end_offset" : 2,      "type" : "CN_WORD",      "position" : 1    },    {      "token" : "依",      "start_offset" : 2,      "end_offset" : 3,      "type" : "CN_CHAR",      "position" : 2    },    {      "token" : "山",      "start_offset" : 3,      "end_offset" : 4,      "type" : "CN_CHAR",      "position" : 3    },    {      "token" : "尽",      "start_offset" : 4,      "end_offset" : 5,      "type" : "CN_CHAR",      "position" : 4    }  ]}

⑥高亮查询

GET /test/_doc/_search{    "query":{        "match":{"name":"小"}    },        "highlight":{      "fields":{        "name":{}      }    }  }{  "took" : 89,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 6,      "relation" : "eq"    },    "max_score" : 0.18681718,    "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "2",        "_score" : 0.18681718,        "_source" : {          "name" : "小李",          "age" : 19        },        "highlight" : {          "name" : [            "李"          ]        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "3",        "_score" : 0.18681718,        "_source" : {          "name" : "小张",          "age" : 18        },        "highlight" : {          "name" : [            "张"          ]        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : 0.18681718,        "_source" : {          "name" : "小明",          "age" : 16        },        "highlight" : {          "name" : [            "明"          ]        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "6",        "_score" : 0.18681718,        "_source" : {          "name" : "小黄",          "age" : 16        },        "highlight" : {          "name" : [            "黄"          ]        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "7",        "_score" : 0.18681718,        "_source" : {          "name" : "小黑",          "age" : 16        },        "highlight" : {          "name" : [            "黑"          ]        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "9",        "_score" : 0.18681718,        "_source" : {          "name" : "小花",          "age" : 16        },        "highlight" : {          "name" : [            "花"          ]        }      }    ]  }}GET /test/_doc/_search{    "query":{        "match":{"name":"小"}    },      "highlight": {    "pre_tags": "

", "post_tags": "

", "fields": { "name": {} } } }{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 6, "relation" : "eq" }, "max_score" : 0.18681718, "hits" : [ { "_index" : "test", "_type" : "_doc", "_id" : "2", "_score" : 0.18681718, "_source" : { "name" : "小李", "age" : 19 }, "highlight" : { "name" : [ "

李" ] } }, { "_index" : "test", "_type" : "_doc", "_id" : "3", "_score" : 0.18681718, "_source" : { "name" : "小张", "age" : 18 }, "highlight" : { "name" : [ "

张" ] } }, { "_index" : "test", "_type" : "_doc", "_id" : "4", "_score" : 0.18681718, "_source" : { "name" : "小明", "age" : 16 }, "highlight" : { "name" : [ "

明" ] } }, { "_index" : "test", "_type" : "_doc", "_id" : "6", "_score" : 0.18681718, "_source" : { "name" : "小黄", "age" : 16 }, "highlight" : { "name" : [ "

黄" ] } }, { "_index" : "test", "_type" : "_doc", "_id" : "7", "_score" : 0.18681718, "_source" : { "name" : "小黑", "age" : 16 }, "highlight" : { "name" : [ "

黑" ] } }, { "_index" : "test", "_type" : "_doc", "_id" : "9", "_score" : 0.18681718, "_source" : { "name" : "小花", "age" : 16 }, "highlight" : { "name" : [ "

花" ] } } ] }}

六、SpringBoot整合1、导入依赖

导入elasticsearch

        org.springframework.boot        spring-boot-starter-data-elasticsearch

提前导入fastjson、lombok

        com.alibaba        fastjson        1.2.70        org.projectlombok        lombok        true

2、创建并编写配置类

@Configurationpublic class ElasticSearchConfig {// 注册 rest高级客户端@Beanpublic RestHighLevelClient restHighLevelClient(){RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200,"http")));return client;}}

3、创建并编写实体类

@Data@NoArgsConstructor@AllArgsConstructorpublic class User implements Serializable {private static final long serialVersionUID = -3843548915035470817L;private String name;private Integer age;}

4、测试注入 RestHighLevelClient

    @Autowired    public RestHighLevelClient restHighLevelClient;

索引的操作1、索引的创建

    public void CreatIndex() throws IOException {        CreateIndexRequest request = new CreateIndexRequest("test6");        CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);        System.out.println(response.isAcknowledged());        System.out.println(response);        restHighLevelClient.close();        return ;    }

2、索引的获取,并判断其是否存在

    public void IndexIsExists() throws IOException {        GetIndexRequest request = new GetIndexRequest("test6");        boolean exists = restHighLevelClient.indices().exists(request,RequestOptions.DEFAULT);        System.out.println(exists);        restHighLevelClient.close();        return;    }

3、索引的删除

    public void DeleteIndex() throws IOException {        DeleteIndexRequest request = new DeleteIndexRequest("test6");        AcknowledgedResponse response = restHighLevelClient.indices().delete(request,RequestOptions.DEFAULT);        System.out.println(response.isAcknowledged());        restHighLevelClient.close();        return;    }

文档的操作1、文档的添加

public void AddDocument() throws IOException {User user = new User("笑笑",25);IndexRequest request = new IndexRequest("test");request.id("16");request.timeout(TimeValue.timeValueMillis(1000));request.source(JSON.toJSONString(user),XContentType.JSON);IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);System.out.println(response.status());System.out.println(response);restHighLevelClient.close();    return;}

2、文档信息的获取

public void GetDocument() throws IOException {GetRequest request = new GetRequest("test","1");GetResponse response = restHighLevelClient.get(request,RequestOptions.DEFAULT);System.out.println(response.getSourceAsString());restHighLevelClient.close();return;}

3、文档的获取,并判断其是否存在

public void DocumentIsExists() throws IOException {    GetRequest request = new GetRequest("test","1111");    request.fetchSourceContext(new FetchSourceContext(false));    request.storedFields("_none_");    boolean exists = restHighLevelClient.exists(request,RequestOptions.DEFAULT);    System.out.println(exists);    restHighLevelClient.close();    return;}

4、文档的更新

public void UpdateDocument() throws IOException {UpdateRequest request =  new UpdateRequest("test","16");User user = new User("黑黑",18);request.doc(JSON.toJSONString(user),XContentType.JSON);UpdateResponse response = restHighLevelClient.update(request,RequestOptions.DEFAULT);System.out.println(response.status());restHighLevelClient.close();    return;}

5、文档的删除

public void DeleteDocument() throws Exception {DeleteRequest request = new DeleteRequest("test","1");request.timeout("1s");DeleteResponse response = restHighLevelClient.delete(request,RequestOptions.DEFAULT);System.out.println(response.status());restHighLevelClient.close();}

6、文档的查询

public void Search() throws Exception {SearchRequest request = new SearchRequest("test");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name","明");//MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();searchSourceBuilder.highlighter(new HighlightBuilder());searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));searchSourceBuilder.query(termQueryBuilder);//searchSourceBuilder.query(matchAllQueryBuilder);searchSourceBuilder.from(0);searchSourceBuilder.size(100);request.source(searchSourceBuilder);SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);SearchHits hits = search.getHits();System.out.println(JSON.toJSONString(hits));System.out.println("++++++++++++++++++++++++++++++++++++++++");for (SearchHit documentFields: hits.getHits()) {System.out.println(documentFields.getSourceAsMap());}restHighLevelClient.close();}

错误的批量添加数据

public void test() throws Exception {    IndexRequest request = new IndexRequest("bulk");    request.source(JSON.toJSONString(new User("小1",12)),XContentType.JSON);request.source(JSON.toJSONString(new User("小2",12)),XContentType.JSON);request.source(JSON.toJSONString(new User("小3",12)),XContentType.JSON);request.source(JSON.toJSONString(new User("小4",12)),XContentType.JSON);request.source(JSON.toJSONString(new User("小5",12)),XContentType.JSON);request.source(JSON.toJSONString(new User("小6",12)),XContentType.JSON);request.source(JSON.toJSONString(new User("小7",12)),XContentType.JSON);IndexResponse indexResponse = restHighLevelClient.index(request,RequestOptions.DEFAULT);System.out.println(indexResponse.status());restHighLevelClient.close();}

7、批量添加数据

public void testBullk() throws Exception {BulkRequest bulkRequest = new BulkRequest();bulkRequest.timeout("10s");ArrayList users = new ArrayList();users.add(new User("小1",12));users.add(new User("小2",12));users.add(new User("小3",12));users.add(new User("小4",12));users.add(new User("小5",12));users.add(new User("小6",12));for (User user:users) {bulkRequest.add(new IndexRequest("bulk").source(JSON.toJSONString(user),XContentType.JSON));}BulkResponse response = restHighLevelClient.bulk(bulkRequest,RequestOptions.DEFAULT);System.out.println(response.status());restHighLevelClient.close();}

七、ElasticSearch实战防京东商城搜索(高亮)

1、导入依赖

                    org.jsoup        jsoup        1.10.2                    com.alibaba        fastjson        1.2.70                    org.springframework.boot        spring-boot-starter-data-elasticsearch                    org.springframework.boot        spring-boot-starter-thymeleaf                    org.springframework.boot        spring-boot-starter-web                    org.springframework.boot        spring-boot-devtools        runtime        true                    org.springframework.boot        spring-boot-configuration-processor        true                    org.projectlombok        lombok        true                    org.springframework.boot        spring-boot-starter-test        test    

2、导入前端素材

ES资料地址:链接:https://pan.baidu.com/s/1qdvSk7SdVnlI8QzeK5gxaA 提取码:ldrh 

3、编写 application.preperties配置文件

# 更改端口,防止冲突server.port=9999# 关闭thymeleaf缓存spring.thymeleaf.cache=false

4、测试controller和view

@Controllerpublic class DemoApi {@GetMapping({"/","index"})public String index(){return "index";}}

5、编写service

ContentService

@Servicepublic class ContentService {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 1、解析数据放入 es 索引中public Boolean parseContent(String keyword) throws IOException {// 获取内容List contents = HtmlParseUtil.parseJD(keyword);// 内容放入 es 中BulkRequest bulkRequest = new BulkRequest();bulkRequest.timeout("2m"); // 可更具实际业务是指for (int i = 0; i < contents.size(); i++) {bulkRequest.add(new IndexRequest("jd_goods").id(""+(i+1)).source(JSON.toJSONString(contents.get(i)), XContentType.JSON));}BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);//restHighLevelClient.close();return !bulk.hasFailures();}// 2、根据keyword分页查询结果public List<Map> search(String keyword, Integer pageIndex, Integer pageSize) throws IOException {if (pageIndex < 0){pageIndex = 0;}SearchRequest jd_goods = new SearchRequest("jd_goods");// 创建搜索源建造者对象SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// 条件采用:精确查询 通过keyword查字段nameTermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);searchSourceBuilder.query(termQueryBuilder);searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 60s// 分页searchSourceBuilder.from(pageIndex);searchSourceBuilder.size(pageSize);// 高亮// ....// 搜索源放入搜索请求中jd_goods.source(searchSourceBuilder);// 执行查询,返回结果SearchResponse searchResponse = restHighLevelClient.search(jd_goods, RequestOptions.DEFAULT);//restHighLevelClient.close();// 解析结果SearchHits hits = searchResponse.getHits();List<Map> results = new ArrayList();for (SearchHit documentFields : hits.getHits()) {Map sourceAsMap = documentFields.getSourceAsMap();results.add(sourceAsMap);}// 返回查询的结果return results;}// 3、 在2的基础上进行高亮查询public List<Map> highlightSearch(String keyword, Integer pageIndex, Integer pageSize) throws IOException {SearchRequest searchRequest = new SearchRequest("jd_goods");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// 精确查询,添加查询条件TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));searchSourceBuilder.query(termQueryBuilder);// 分页searchSourceBuilder.from(pageIndex);searchSourceBuilder.size(pageSize);// 高亮 =========HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("name");highlightBuilder.preTags("");highlightBuilder.postTags("");searchSourceBuilder.highlighter(highlightBuilder);// 执行查询searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);// 解析结果 ==========SearchHits hits = searchResponse.getHits();List<Map> results = new ArrayList();for (SearchHit documentFields : hits.getHits()) {// 使用新的字段值(高亮),覆盖旧的字段值Map sourceAsMap = documentFields.getSourceAsMap();// 高亮字段Map highlightFields = documentFields.getHighlightFields();HighlightField name = highlightFields.get("name");// 替换if (name != null){Text[] fragments = name.fragments();StringBuilder new_name = new StringBuilder();for (Text text : fragments) {new_name.append(text);}sourceAsMap.put("name",new_name.toString());}results.add(sourceAsMap);}return results;}}

6、编写controller

@Controllerpublic class DemoApi {@GetMapping({"/","index"})public String index(){return "index";}@Autowiredprivate ContentService contentService;@ResponseBody@GetMapping("/parse/{keyword}")public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {return contentService.parseContent(keyword);}@ResponseBody@GetMapping("/search/{keyword}/{pageIndex}/{pageSize}")public List<Map> parse(@PathVariable("keyword") String keyword,   @PathVariable("pageIndex") Integer pageIndex,   @PathVariable("pageSize") Integer pageSize) throws IOException {return contentService.search(keyword,pageIndex,pageSize);}@ResponseBody@GetMapping("/h_search/{keyword}/{pageIndex}/{pageSize}")public List<Map> highlightParse(@PathVariable("keyword") String keyword,@PathVariable("pageIndex") Integer pageIndex,@PathVariable("pageSize") Integer pageSize) throws IOException {return contentService.highlightSearch(keyword,pageIndex,pageSize);}}

7、爬虫(jsoup)HtmlParseUtil

public class HtmlParseUtil {public static void main(String[] args) throws IOException {/// 使用前需要联网// 请求urlString url = "http://search.jd.com/search?keyword=java";// 1.解析网页(jsoup 解析返回的对象是浏览器Document对象)Document document = Jsoup.parse(new URL(url), 30000);// 使用document可以使用在js对document的所有操作// 2.获取元素(通过id)Element j_goodsList = document.getElementById("J_goodsList");// 3.获取J_goodsList ul 每一个 liElements lis = j_goodsList.getElementsByTag("li");// 4.获取li下的 img、price、namefor (Element li : lis) {String img = li.getElementsByTag("img").eq(0).attr("src");// 获取li下 第一张图片String name = li.getElementsByClass("p-name").eq(0).text();String price = li.getElementsByClass("p-price").eq(0).text();System.out.println("=======================");System.out.println("img : " + img);System.out.println("name : " + name);System.out.println("price : " + price);}}public static List parseJD(String keyword) throws IOException {/// 使用前需要联网// 请求urlString url = "http://search.jd.com/search?keyword=" + keyword;// 1.解析网页(jsoup 解析返回的对象是浏览器Document对象)Document document = Jsoup.parse(new URL(url), 30000);// 使用document可以使用在js对document的所有操作// 2.获取元素(通过id)Element j_goodsList = document.getElementById("J_goodsList");// 3.获取J_goodsList ul 每一个 liElements lis = j_goodsList.getElementsByTag("li");//        System.out.println(lis);// 4.获取li下的 img、price、name// list存储所有li下的内容List contents = new ArrayList();for (Element li : lis) {// 由于网站图片使用懒加载,将src属性替换为data-lazy-imgString img = li.getElementsByTag("img").eq(0).attr("data-lazy-img");// 获取li下 第一张图片String name = li.getElementsByClass("p-name").eq(0).text();String price = li.getElementsByClass("p-price").eq(0).text();// 封装为对象Content content = new Content(name,img,price);// 添加到list中contents.add(content);}        System.out.println(contents);// 5.返回 listreturn contents;}}

Content

@Data@AllArgsConstructor@NoArgsConstructorpublic class Content implements Serializable {private static final long serialVersionUID = -8049497962627482693L;private String name;private String img;private String price;}

8、前后端分离引入js

        

修改后的index.html

        狂神说Java-ES仿京东实战        

天猫搜索
品牌
综合 人气 新品 销量 价格

店铺: 狂神说Java

月成交999笔 评价 3

new Vue({ el:"#app", data:{ "keyword": '', // 搜索的关键字 "results":[] // 后端返回的结果 }, methods:{ searchKey(){ var keyword = this.keyword; console.log(keyword); axios.get('h_search/'+keyword+'/0/20').then(response=>{ console.log(response.data); this.results=response.data; }) } } });

9、遗留问题

restHighLevelClient.close(); 引起java.lang.RuntimeException: Request execution cancelled 错误