How To Use The Search API For The Python Elasticsearch Client

Introduction

Elasticsearch is widely known for its fast, efficient full-text search queries. While these queries can be executed from the command line using cURL, there are a number of clients available that allow you to work with Elasticsearch from many popular programming languages. In this article, we’ll focus on the Elasticsearch Search API for Python and provide examples of different Elasticsearch Python queries.

Prerequisites

Before we can begin looking at some Python code, it’s important to review the system requirements for this tutorial. The following prerequisites are necessary in order to query Elasticsearch documents in Python:

  • The Python package must be installed, although most current operating systems come with it. The preferred version is Python 3, and the examples in this article assume that this version is being used.

  • Elasticsearch must be installed and running. The default port for the service is 9200; you can that Elasticsearch is running with a simple cURL request in a terminal or command prompt window:

curl -XGET localhost:9200
  • You’ll also need to have at least one index containing a few documents to test your search queries.

  • The Python client for Elasticsearch needs to be installed, as well as the PIP package manager for Python. You can use the pip3 command to install the Elasticsearch library for Python 3:

pip3 install elasticsearch
  • Open IDLE by typing “idle3” into a terminal, or open a Python interpreter by typing “python3“. Then, use the followng commands to get the version of the Elasticsearch client:
import elasticsearch
print (elasticsearch.VERSION)

In response, you’ll receive a tuple object containing three numbers. The first number will represent the major version of the library.

  • You’ll need to have remote SSH access to the server where Elasticsearch is running or have a localhost server running for development. Use a terminal-based text editor like nano to edit your Python script if you are accessing a server remotely.

  • It’s helpful to have some knowledge of Python and its language syntax before beginning this tutorial.

Set up the Python script for the Elasticsearch client

Now that we’ve covered all the system requirements, it’s time to turn our attention to the code. The first step is to create a new Python script that will be used to make calls to the Elasticsearch client. This can be done using the touch command in a terminal window, followed by the file name. Be sure that the file name uses the .py file extension (e.g. "touch my_python_script.py").

You can edit the script in any IDE that supports Python indentation. If you’d prefer to edit the file in a terminal window, use nano to edit the file:

sudo nano my_python_script.py

NOTE: When you’ve finished editing a nano script, simply press CTRL+O to save your changes, and then CTRL+X to close the editor.

You can also test the code locally using Python’s IDLE environment– just open a terminal and type: idle3.

First, import the Elasticsearch client library for Python, and you may also include the optional shebang and encoding lines at the top of the script:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

Next, we’ll create a new client instance using that library we just imported:

# domain name, or server's IP address, goes in the 'hosts' list
elastic_client = Elasticsearch(hosts=["localhost"])

Create a Python dictionary for Elasticsearch search query

We’ll need to create a Python dictionary that will be passed to the client’s search() method. This dictionary will contain key-value pairs that represent the search parameters, the fields to be searched and the values.

The dictionary will be passed to the body parameter of the method. The first key should be the Elasticsearch "query" field.

Elasticsearch _search query in the form of a Python dictionary

It’s easier to understand the structure of a Python dictionary when you can see it used in an example. Here’s what a basic search query would look like in a Python script:

query_body = {
  "query": {
      "match": {
          "some_field": "search_for_this"
      }
  }
}

Python dictionaries map closely to JSON objects, so the Kibana version of the query shown above would look almost identical, except you would replace the query_body declaration with GET some_index/_search.

Instantiate the Python dictionary while calling the Search() method

You can pass the dictionary data for the query directly to the search method at the time of the call.

The only two required parameters for the Search API in Python are the index you want to search, and the body of the Elasticsearch query:

elastic_client.search(index="some_index", body=some_query)

Get all documents in an Elasticsearch index using the match_all search parameter

In our next example, we’ll create a query to get all the documents in a particular index. We’ll use the Elasticsearch "match_all" option in the Python dictionary query to accomplish this.

The example below has the query passed into the method call directly. It returns a result object containing all of the documents returned by the API call:

result = elastic_client.search(
    index="some_index",
    body={
        "query": {
            "match_all": {}
        }
    }
)

The indented example shown above is clearly more readable, but it’s possible to execute the same call with just one line of code:

result = elastic_client.search(index="some_index", body={"query": {"match_all": {}}})

The Kibana Console equivalent of this same query is shown below:

GET some_index/_search
{
  "query": {
    "match_all": {}
  }
}

In many real-world applications, there would likely be an HTTP POST of the query parameters to the Python script. Those parameters would then be passed to the body parameter of the method, which is used to perform an Elasticsearch search in Python.

Let’s see how an HTTP response could be processed and passed to the method. In the following example, we pass a string to the dictionary:

# User makes a request on client side
user_request = "some_param"

# Take the user's parameters and put them into a
# Python dictionary structured as an Elasticsearch query:
query_body = {
  "query": {
    "bool": {
      "must": {
        "match": {      
          "some_field": user_request
        }
      }
    }
  }
}

# Pass the query dictionary to the 'body' parameter of the
# client's Search() method, and have it return results:
result = elastic_client.search(index="some_index", body=query_body)

Any documents returned by the query shown above must have both of the values that were passed into the list. The "hits" for the search query are returned by the API and put into the result variable.

Here’s what the same _search request would look like when executed in Kibana:

GET some_index/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "some_field": "some_param"
        }
      }
    }
  }
}

Have the Elasticsearch Search() method return a Python list of the document ‘hits’

When you call the search() method, it returns a list of Elasticsearch document "hits". Each hit is comprised of its own respective dictionary which contains all the document’s fields and values.

Get the results for an Elasticsearch query nested inside the ["hits"]["hits"] dictionary key

What this means is that the results take the form of a nested dictionary– a dictionary made up of dictionaries. To access a key’s value, use ["hits"]["hits"] as seen in the following example:

result = elastic_client.search(index="some_index", body=query_body)
print ("query hits:", result["hits"]["hits"])

Have the Python Search() method return more than just 10 “hits”

By default, Elasticsearch search queries will return only 10 hits, even if more matches are found. If you’d like to have the Python client return more, you can pass the optional size parameter when calling the search() method:

result = elastic_client.search(index="some_index", body=query_body, size=999)
print ("total hits:", len(result["hits"]["hits"]))

NOTE: Python’s len() function can be used to have Python return the number of "hits" from the API call.

Have the Elasticsearch Python client return more than just 10 "hits" when calling the search() method

Screenshot of Python's IDLE making a search request and having it return more than 10 hits

Iterate through the Elasticsearch documents returned by the Search API in Python

After you make the call, you can take the result object and get the list of documents inside its ["hits"]["hits"] dictionary attribute:

# returns 4 different keys: "took", "timed_out", "_shards", and "hits"
result = elastic_client.search(index="some_index", body=query_body)
all_hits = result['hits']['hits']

Now we can take that all_hits list and iterate through it. In Python 3, the enumerate() function is a more efficient way to iterate through a list than the traditional for loop:

# iterate the nested dictionaries inside the ["hits"]["hits"] list
for num, doc in enumerate(all_hits):
    print ("DOC ID:", doc["_id"], "--->", doc, type(doc), "\n")
   
    # Use 'iteritems()` instead of 'items()' if using Python 2
    for key, value in doc.items():
        print (key, "-->", value)
   
    # print a few spaces between each doc for readability
    print ("\n\n")

By doing this, you can then access all of the attributes for each document’s dictionary using the Python built-in items() method for dictionaries (if you’re using Python 2.x, use the iteritems() method instead).

Screenshot of a Python script iterating through a search query to an Elasticsearch cluster with an output to terminal

Conclusion

When you’re using Elasticsearch to store and search your data, it’s important to know how to use the service’s fast, efficient query functionality. The Elasticsearch Python client makes it easy to construct the queries you need from a Python script and process the returned results. With the instructions provided in this article, you’ll have no trouble querying Elasticsearch documents in Python using the Search API.

In this article, we reviewed the example code one segment at a time. Here’s what the complete Python script looks like:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

# domain name, or server's IP address, goes in the 'hosts' list
elastic_client = Elasticsearch(hosts=["localhost"])

# User makes a request on client side
user_request = "some_param"

# Take the user's parameters and put them into a Python
# dictionary structured like an Elasticsearch query:
query_body = {
  "query": {
    "bool": {
      "must": {
        "match": {      
          "some_field": user_request
        }
      }
    }
  }
}

# call the client's search() method, and have it return results
result = elastic_client.search(index="some_index", body=query_body)

# see how many "hits" it returned using the len() function
print ("total hits:", len(result["hits"]["hits"]))


'''
MAKE ANOTHER CALL THAT RETURNS
MORE THAN 10 HITS BY USING THE 'size' PARAM
'''

result = elastic_client.search(index="some_index", body=query_body, size=999)
all_hits = result['hits']['hits']

# see how many "hits" it returned using the len() function
print ("total hits using 'size' param:", len(result["hits"]["hits"]))

# iterate the nested dictionaries inside the ["hits"]["hits"] list
for num, doc in enumerate(all_hits):
    print ("DOC ID:", doc["_id"], "--->", doc, type(doc), "\n")

    # Use 'iteritems()` instead of 'items()' if using Python 2
    for key, value in doc.items():
        print (key, "-->", value)

    # print a few spaces between each doc for readability
    print ("\n\n")

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.