Skip to main content
 首页 » 编程设计

通过SQL访问Elasticsearch

2022年07月19日140Free-Thinker

Elasticsearch SQL是可以通过SQL方式查询Elasticsearch的组件。你能认为它是转换器,既能理解SQL,又能理解Elasticsearch,并利用Elasticsearch功能易于实现实时读取和处理数据。

1. Elasticsearch SQL

原生整合
Elasticsearch SQL是为Elasticsearch官方提供,根据底层存储对相关节点有效地执行每个查询。
无外部组件
不需要额外的硬件,进程,运行时库去查询Elasticsearch;通过在Elasticsearch集群中运行,Elasticsearch SQL消除了额外的组件。
轻量有效
Elasticsearch SQL并不抽象Elasticsearch及其搜索功能——相反,它包含并公开SQL,以同样实时的、声明式、简洁的方式支持全文搜索。

2. 示例

增加示例数据:

PUT /library/_bulk?refresh 
{"index":{"_id": "Leviathan Wakes"}} 
{"name": "Leviathan Wakes", "author": "James S.A. Corey", "release_date": "2011-06-02", "page_count": 561} 
{"index":{"_id": "Hyperion"}} 
{"name": "Hyperion", "author": "Dan Simmons", "release_date": "1989-05-26", "page_count": 482} 
{"index":{"_id": "Dune"}} 
{"name": "Dune", "author": "Frank Herbert", "release_date": "1965-06-01", "page_count": 604} 

现在执行SQL查询:

POST /_sql?format=txt 
{ 
  "query": "SELECT * FROM library WHERE release_date < '2000-01-01'" 
} 

这里format表示输出格式,还有其他类型,如json、csv、yaml等。txt以表格方式进行展示便于阅读。

    author     |     name      |  page_count   | release_date 
---------------+---------------+---------------+------------------------ 
Dan Simmons    |Hyperion       |482            |1989-05-26T00:00:00.000Z 
Frank Herbert  |Dune           |604            |1965-06-01T00:00:00.000Z 

2.1 输出json列式数据

增加"columnar": true参数,结果以列式输出,一般用于图表展示数据。

POST /_sql?format=json 
{ 
  "query": "SELECT * FROM library ORDER BY page_count DESC", 
  "fetch_size": 5, 
  "columnar": true 
} 
{ 
  "columns": [ 
    {"name": "author", "type": "text"}, 
    {"name": "name", "type": "text"}, 
    {"name": "page_count", "type": "short"}, 
    {"name": "release_date", "type": "datetime"} 
  ], 
  "values": [ 
    ["Peter F. Hamilton", "Vernor Vinge", "Frank Herbert", "Alastair Reynolds", "James S.A. Corey"], 
    ["Pandora's Star", "A Fire Upon the Deep", "Dune", "Revelation Space", "Leviathan Wakes"], 
    [768, 613, 604, 585, 561], 
    ["2004-03-02T00:00:00.000Z", "1992-06-01T00:00:00.000Z", "1965-06-01T00:00:00.000Z", "2000-03-15T00:00:00.000Z", "2011-06-02T00:00:00.000Z"] 
  ], 
  "cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWWWdrRlVfSS1TbDYtcW9lc1FJNmlYdw==:BAFmBmF1dGhvcgFmBG5hbWUBZgpwYWdlX2NvdW50AWYMcmVsZWFzZV9kYXRl+v///w8=" 
} 

cursor是游标,用于读取下一页数据。但后续调用仍然需要columnar参数:

POST /_sql?format=json 
{ 
  "cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWWWdrRlVfSS1TbDYtcW9lc1FJNmlYdw==:BAFmBmF1dGhvcgFmBG5hbWUBZgpwYWdlX2NvdW50AWYMcmVsZWFzZV9kYXRl+v///w8=", 
  "columnar": true 
} 

2.2 传输查询参数

我们可以在SQL语句中直接写具体值进行查询:

POST /_sql?format=txt 
{ 
	"query": "SELECT YEAR(release_date) AS year FROM library WHERE page_count > 300 AND author = 'Frank Herbert' GROUP BY year HAVING COUNT(*) > 0" 
} 

但也可以抽取参数在单独行,值用占位符代替:

POST /_sql?format=txt 
{ 
	"query": "SELECT YEAR(release_date) AS year FROM library WHERE page_count > ? AND author = ? GROUP BY year HAVING COUNT(*) > ?", 
	"params": [300, "Frank Herbert", 0] 
} 

2.3 REST 参数

除了query和fetch_size外,还有其他一些参数,如请求超时等。下面列出常用参数:

参数名称 缺省值 描述
query 必须 执行的查询sql
fetch_size 1000 返回最大行数
filter 可选,使用query dsl增加过滤功能
request_timeout 90s 请求超时时间
columnar 1000 返回列式结果,仅这些格式可用:json, yaml, cbor, smile
params 列出占位符参数的值

2.4 函数和操作符

内容比较多,一般理解SQL语法掌握比较容易,读者直接阅读官网文档吧。

3. 总结

本文通过示例简要介绍SQL访问Elasticsearch。对于灵活定义sql输出结果以图表方式(如,eCharts)展示比较方便。


本文参考链接:https://blog.csdn.net/neweastsun/article/details/110338763
阅读延展