How To Use Python's JSON Library In Elasticsearch

Written by Data Pilot

August 14, 2019

Elasticsearch
JSON
Python

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

If you’re working with Python and Elasticsearch, it’s important to make sure you’ve lined up all the tools you need to get the job done efficiently. One key tool is Python’s JSON library. This built-in JSON library seamlessly converts Python dict (dictionary) type objects into JSON strings and vice versa, making it easy to pass data to Elasticsearch in the correct format. In this article, we’ll show you how to use the Python JSON library with Elasticsearch in your scripts. We’ll use Python to declare an Elasticsearch document as a Python dictionary. then convert that dictionary into a JSON string that can be passed to the Elasticsearch client’s API method calls.

Prerequisites

Let’s take a quick look at some key prerequisites that need to be taken care of before we can proceed with our task:

Make sure Python 3 is installed and working. Although Python 2.7 may still work for the code example in this article, this version of Python is deprecated and will ultimately lose support.
The PIP3 package manager for Python also needs to be installed on the Elasticsearch cluster that’s running the Python script. We’ll be using PIP3 to install the Elasticsearch client for Python:

1	pip3 install elasticsearch

You’ll need to have some basic knowledge of Python and its syntax. We’ll cover some of Python’s keywords and functions, and the terminology and concepts may be confusing to a beginner Pythonista.

Create a Python script in the Elasticsearch project directory

Now that we’ve covered the prerequisites, let’s start working on our Python script. Navigate to the directory for your Elasticsearch project (use mkdir to create a new directory if needed):

1	cd elasticsearch-project

Use the touch command to create a Python script:

1	touch elastic_json.py

The new Python file should now be in your project directory. You can use a terminal-based editor such as vim, nano, or gedit; however, it’s best to use an IDE that supports Python indentation and syntax locally. This will help you avoid frustrating syntax or indentation errors.

Import the Python package libraries for JSON and Elasticsearch

The first thing we’ll do in our script is import the necessary libraries to avoid receiving an ImportError when calling a library’s attribute or method. Here are the libraries we’ll need:

1
2
3
4
5
6
7
8

# import the built-in Python JSON library
import json

# import Datetime for the document's timestamp
from datetime import datetime

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

NOTE: The json library is native to Python 3; therefore, it doesn’t need to be installed with pip3. However, it still needs to be imported before you’re able to use it.

Get familiar with Python’s JSON library before using it with Elasticsearch

Before digging in to the code and trying to use the Python JSON library with Elasticsearch, it’s a good idea to play around with the JSON library and check out its attributes. You can type python3 in a terminal to use a Python interpreter or idle3 to use the IDLE Python environment. Two methods to take a closer look at are loads() and dumps():

1
2
3
4

import json
dir(json)
dir(json.loads)
dir(json.dumps)

Python’s dir() function can be very useful for getting more attribute information about a particular package, library or method in Python.

Screenshot of Python's IDLE getting the attributes of the JSON library

When you call the json.dumps() method, it returns a JSON string of whatever Python dictionary was passed to it, but it will throw a ValueError if the JSON object is not valid. This is a great way to catch potential JSON errors before the data gets passed to Elasticsearch. Conversely, the json.loads() method will take a valid JSON string and return a Python dictionary.

Catch JSON errors using a try-except error handling block in Python

You can use Python to catch errors using its try-except indentation block. Let’s check out an example of an invalid JSON string being passed to json.loads(). This will raise a ValueError inside of the try-except block:

1
2
3
4
5
6
7
8
9

# here's an example of an invalid JSON string
bad_json = '{"this is": "missing the closing bracket"'

# json.loads() will throw a ValueError if JSON is invalid
try:
json.loads(bad_json)
except ValueError as error:
print ("Error type:", type(error))
print ("json.loads() ValueError for JSON object:", error)

Build an Elasticsearch JSON document with key-value pairs in a Python dictionary

Much like a JSON object, a Python dictionary is simply a set of key-value pairs. One dictionary can also be nested inside another dictionary value. Let’s look at an example of how we can build a nested Elasticsearch document in Python. In our example, we’ll build this nested document by passing the document’s _source data, in the form of a Python dict object, to the value of another dictionary’s object’s key. If this sounds a bit confusing, there’s no need to worry– it will become much clearer as we look at the code.

Declare a Python dictionary for the Elasticsearch document’s _source data

Notice that this example uses several different data types for its document fields and uses Python’s datetime library to create a valid timestamp:

1
2
3
4
5
6
7
8

# _source data for the Elasticsearch document
doc_source = {
"string field": "Object Rocket articles",
"integer field": 42,
"boolean field": False,
# must be string for JSON seralization
"timestamp": str(datetime.now())
}

NOTE: The datetime timestamp must be explicitly cast as a string; if not, the JSON library’s dumps() method will throw a TypeError:

1	TypeError: Object of type 'datetime' is not JSON serializable

Declare the Elasticsearch Python dictionary by nesting the _source data inside it

1
2
3
4
5
6
7

# Elasticsearch document structure as a Python dict
doc = {
"_index": "some_index",
"_id": 12345,
"doc_type": "_doc", # doc type deprecated
"_source": doc_source,
}

Create a JSON string of the Python dictionary using json.loads()

You can use the indent parameter to specify the number of spaces for each indentation in the JSON object. This serves to “prettify” the string, making it easier to read:

1
2
3

# use the 'indent' parameter with json.dumps() for more readable JSON
json_str = json.dumps(doc, indent=4)
print ("\nDOC json_str:", json_str)

Screenshot comparing two JSON dict objects representing Elasticsearch documents created in Python

Build an Elasticsearch JSON document from a simple Python dictionary

At this point, we’re ready to prepare the JSON document that we’ll be passing to the Elasticsearch client. This document will be built from our Python dictionary object:

1
2
3
4
5
6
7
8
9
10
11

# build the Elasticsearch document from a dict
build_doc = {}
build_doc["_index"] = "some_index"
build_doc["_id"] = 12345
build_doc["doc_type"] = "_doc" # doc type deprecated
build_doc["_source"] = doc_source

# print the mapping
print ("\nbuild_doc items:", build_doc.items())
json_str = json.dumps(build_doc)
print ("json_str without indent:", json_str)

Compare the two JSON document dictionary objects in Python

1 2	# compare this with the previous JSON dict print ("\nJSON objects are the same:", build_doc == doc)

Create a JSON string of the Elasticsearch document dictionary and make an API call

Here, we create our JSON string using json.dumps() and make sure it doesn’t raise any errors:

1
2
3
4
5
6

try:
# create JSON string of doc _source data
json_source = json.dumps(build_doc["_source"])

# get the dict object's _id
json_id = build_doc["_id"]

Finally, we make an API call to index this document in Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# make an API call to the Elasticsearch cluster
response = client.index(
index = 'some_index',
doc_type = '_doc',
id = json_id,
body = json_source
)

# print a pretty response to the index() method call response
print ("\nclient.index response:", json.dumps(response, indent=4))

except Exception as error:
print ("Error type:", type(error))
print ("client.index() ERROR:", error)

Create a JSON Elasticsearch query using a Python dictionary

1
2
3
4
5
6

# build a JSON Python dict to query all documents in an Elasticsearch index
all_docs = {}
all_docs["size"] = 9999
all_docs["query"] = {"match_all" : {}}
print ("\nall_docs:", all_docs)
print ("all_docs TYPE:", type(all_docs))

Pass the Python dictionary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

# validate the JSON format using the loads() method
try:
# pass the JSON string in an API call to the Elasticsearch cluster
response = client.search(
index = "some_index",
body = all_docs
)

# print all of the documents in the Elasticsearch index
print ("all_docs query response:", response)

# use the dumps() method's 'indent' parameter to print a pretty response
print ("all_docs pretty:", json.dumps(response, indent=4))

except Exception as error:
print ("Error type:", type(error))
print ("client.search() ValueError for JSON object:", error)

Conclusion

It’s clear that using the Python JSON library with Elasticsearch can make your development tasks easier and more efficient. Now that we’ve finished creating our script, all we need to do is run it. You can run the Python script using the python3 command followed by the file name in a terminal or command prompt window. Be sure to specify the path or navigate to the location of the Python script when you execute the script.

At the end of your terminal output, you should see something like the following:

1
2
3
4

...
all_docs: {'size': 9999, 'query': {'match_all': {}}}
all_docs TYPE: <class 'dict'>
all_docs query response: {'took': 3, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'some_index', '_type': '_doc', '_id': '12345', '_score': 1.0, '_source': {'string field': 'Object Rocket articles', 'integer field': 42, 'boolean field': False, 'timestamp': '2019-08-09 21:08:41.075662'}}]}}

Screenshot of API calls made in Python by passing JSON strings to Elasticsearch client methods

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started