How to Index Elasticsearch Documents Using the Python Client Library
This article will provide an overview on how to index Elasticsearch documents using python. It will also provide examples on how to create a nested JSON document and use Kibana to verify the document when indexing Elasticsearch documents using python.
NOTE: To prevent namespace conflicts with the built-in Python keywords, the Elasticsearch client make use of
from_ instead of
doc_type instead of
type for parameters when creating API calls.
Refer to the official API documentation for additional information.
- Python and its low-level Python client library must be installed on your machine.
- Python 3 is recommended, as Python 2 is depreciating by 2020. As the illustrations in this tutorial use Python 3, there may be compatibility issues if trying to run the given code with Python 2.
- Ensure that Elasticsearch is running correctly. In cURL you can check this by performing a
# you may need to modify your IP address and port.
Alternatively, you can use Python’s
request library to confirm the server is running Elasticsearch:
res = requests.get('http://localhost:9200');
- There should already be an Elasticsearch index created that you should be able to index documents into. The following cURL request can be used to obtain a full list of the indexes:
- If you want to set up a custom UUID for the document’s
id, like the example in this tutorial, you will need to import the Python
uuidmodule. The head of the script should resemble this:
#-*- coding: utf-8 -*-
from Elasticsearch import Elasticsearch
Creating a nested JSON document
Elasticsearch uses JSON objects for both passing data and in making its API responses. Therefore, it is a good idea to familiarize yourself with the way the Elasticsearch indexes and documents are arranged.
Below is the standard layout of an Elasticsearch document. It is basically a nested JSON object with metadata regarding the index containing the document’s data in the
"doc"field (or “key” as they’re called in a Python dictionary):
"_index" : "employees",
"_id" : "12345",
"doc_type" : "person",
"name" : "George Peterson",
However, when indexing an Elasticsearch document using a Python dictionary, you don’t need all of the “meta” data for the index contained in the JSON document itself. This is because that information gets passed as parameters when the client’s
indexmethod is invoked.
Here is what that same JSON document should look like when it is represented as a Python dictionary object (before it gets passed to the
"name": "George Peterson",
"age": "34", # use "" if this is meant to be a text _type
"years": "10" # use "" if this is meant to be a text _type
The Index Method’s API Call
Passing the optional
- Here is an example of an
indexAPI call, returning a response that passes the dictionary object for the document’s body and a randomly-generated UUID as the Elasticsearch document’s
response = elastic.index(
index = 'employees',
doc_type = 'person',
id = uuid.uuid4(),
body = new_employee
- Here is the same document body with the
"years"fields passed as integer-data types, instead of
"string") data types:
"name": "George Peterson",
"age": 34, # this is now an integer data type
"years": 10 # this is now an integer data type
NOTE: If a document’s data field is mapped as an “integer” it should not be enclosed in quotation marks (
"), as in the
"years" fields in this example. Note that if the field’s value is placed inside quotation marks then Elasticsearch will index that field’s datum as if it were a
"text" data type:
Use Kibana to verify the document
- Open up the Kibana Console UI in your browser (typically located at
If the service isn’t running, type
sudo service kibana start into your Linux terminal.
- Use this
GETrequest to view the index’s documents:
Dynamically created Elasticsearch document “_id”
In the prior example, the Python code uses the integrated
uuidlibrary to create a unique UUID for the document’s
It is important to note that if the API call doesn’t explicitly pass an
idfor the newly-created document the Elasticsearch cluster will dynamically create an alpha-numeric ID for it.
Indexing without passing an “_id” parameter
Print the index response
It is good to get into the habit of getting a verified the response returned when making these kinds of API calls.
Unlike other languages, like PHP, Python allows you to print out to console any object in Python without explicitly serializing it first:
dir function will print out all of the methods and attributes of any object in Python.
Here is what the
response object ought to return if the document indexing was successful:
- One major advantage of the approach outlined in this tutorial to index Elasticsearch documents using python is that the same technique can be used to scale the data sets up or down. Bear in mind, every time you employ the
indexmethod, even if the data has not changes since the last time the document was indexed, the program will create a new document with an identical document structure and body as the previous version. As a result, it is important to remember that this has the potential to create an unlimited number of duplicate documents in the index.