How to Create a Python Class Designed to Construct Elasticsearch Documents

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

This tutorial will explain how to create a Python class construct for Elasticsearch documents. All variables and pieces of data are fundamentally “objects” in Python programming language. As all objects have attributes and properties, having a solid understanding of the OOP, or Object-Oriented programming, concept is essential to learning the Python fundamentals. Creating a class for the Elasticsearch documents aids in writing cleaner code and incorporates the OOP principles of encapsulation and inheritance. This tutorial will cover how to construct a simple Python class that will help users construct Elasticsearch JSON documents in a simple and reliable manner using Python object-oriented conventions.

Prerequisites for Creating a Python Class Construct for Elasticsearch Documents

  • Python 3 must be installed and working properly.

  • The PIP3 library used to install packages for Python 3 must be installed on the same machine.

Typically, the pip3 command is used by default. However, the library can also be installed from the APT repository on most versions of Debian and Ubuntu with the following command:

1
sudo apt install python3-pip

NOTE: Execute the pip3 command to install Elasticsearch to make API calls using the document instances demonstrated in this tutorial. Execute the following command to install the Elasticsearch client for Python:

1
pip3 install elasticsearch

How to Create a Python Script and Import the Low-Level Elasticsearch Client

The Python class constructor in this tutorial uses Python’s native JSON library. Import JSON and the Elasticsearch class with the following script:

1
2
3
4
5
6
7
8
# import the built-in JSON library
import json

# import the random integer method
from random import randint

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

NOTE: This tutorial uses the Python’s randint() method to generate some example document data for the Elasticsearch client’s index() API call.

How to create a client instance of the Python Elasticsearch library

Execute the following command to declare a client instance of the Elasticsearch client library, making sure to change the URL string to match the current cluster and domain settings:

1
2
# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

How to declare a global string for the Elasticsearch index name

Declare a string for the index name, as shown below, outside of any functions or class constructors so it will have a global scope:

1
2
# declare global string Elasticsearch index name
INDEX_NAME = "some_index"

This will allow the variable to make API calls and be passed to the document instances.

How to Construct a Class for Elasticsearch Documents in Python

Use Python’s class keyword to declare a class constructor for the Elasticsearch documents by executing the below script. Make sure to include all the necessary variables that must be passed as parameters to the initializer (__init__) list.

1
2
3
4
5
6
7
# Document class for the Elasticsearch documents
class Document:
# class constructor for the Elasticsearch document
def __init__(self, index, id, source):
self.index = index
self.id = id
self.source = source

How to use the Python class constructor to build Elasticsearch JSON documents

The following script executes a dictionary constructor that allows the user to quickly build the appropriate name/value pairs for an Elasticsearch JSON document:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# returns empty dict as default
self.json = {}

try:
# Elasticsearch document structure as a Python dict
self.json = {
"_index": INDEX_NAME,
"_id": self.id,
# doc_type deprecated as of v6 of Elasticsearch
"doc_type": "_doc",
"_source": self.source,
}
print ("Elasticsearch Document JSON:", self.json)

except Exception as error:
print ("Document JSON instance ERROR:", error)

As shown in the following screenshot, this example uses a try-except block to catch any errors that may arise. This is designed to both catch malformed data that may attempt to pass to the "_source" field and also log any errors.

Screenshot of the Document class constructor for the Elasticsearch documents in a Python script

How to give the Python class a method that will return a JSON string of the Elasticsearch document

Use Python’s def keyword to declare a method for the Document class. This will take the Elasticsearch JSON document, declared above, and return it as a JSON string that can then be passed to the Elasticsearch low-level client’s API calls as shown in the following script:

1
2
3
4
5
6
7
8
9
10
11
# define a function that will construct a JSON string
def json_str(self):

# attempt to create a JSON string of the document using json.dumps()
try:
# use the 'indent' parameter with json.dumps() for readable JSON
doc = json.dumps(self.source, indent=4)
except Exception as error:
doc = "{}"
print ("Document json_string() ERROR:", error)
return doc

This method will also catch any errors that may arise while calling the JSON library’s dumps() method and return an empty dict string in such cases.

Use the indent parameter comand to add spaces in the JSON string so it is more easily understood. Note that the number of spaces given each indentation is the integer value.

How to Create Instances of the Document Class for the Elasticsearch Documents

Many instances of the Document class can be declared and used to index Elasticsearch documents. The following code is an example that utilizes Python’s random library to generate some test data for the document instances:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# declare list of string values for documents
random_str = ["rocks!", "helps!", "support", "services", "articles"]

# index 10 Elasticsearch document with random variables
for i in range(10):

# concatenate string from random string in list for field value
ran_str = random_str[randint(0, len(random_str)-1)]

# _source data for the Elasticsearch document instance
doc_source = {
"string field": "Object Rocket " + ran_str,
# random integer for doc
"integer field": randint(1, 99999),
# randomly select 'true' or 'false'
"boolean field": [True, False][randint(0, 1)],
}

How to Instantiate the Elasticsearch Documents and Pass the Documents to the Client’s Index() Method

Execute the following script to declare an instance of the Document class. Use the class’s json_str() method to have it return a string of the document’s _source data that can then be passed to the low-level client instance’s index() method to index the document instance to the index name specified earlier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# instantiate a new Elasticsearch Document passing random _source data
doc = Document(
index = INDEX_NAME,
id = i, # _id is position in range iterator
source = doc_source # pass dict for doc _source data
)

# vars() argument must have __dict__ attribute
if hasattr(doc, "__dict__") == True:

# print the properties and attributes of the Document instance
print ("nNUM", i, "-->", vars(doc))

# print the attributes of Document instance
print (doc.json)
print ("ndoc.json_str():", doc.json_str())

How to pass the document instance to the client’s index() method and get a response from the Elasticsearch cluster

The Elasticsearch cluster should return a dict response showing what occurred with the API call. The response’s 'result' key will indicate if the document was sucessfully indexed or updated:

1
2
3
4
5
6
# attempt to index the Elasticsearch document string
try:
resp = client.index(doc.index, body=doc.json_str(), id=doc.id)
print ("Document index() response:", resp, "n")
except Exception as error:
print ("client.index() ERROR:", error, "n")

How to execute the Python script in a terminal window to index the Elasticsearch documents

Navigate to the directory containing the Python script and run it using the python3 command. An output of the document instances and the API call response from Elasticsearch should be visible in the window. The results should resemble the following:

Screenshot of a terminal window getting a response from the Elasticsearch cluster using Python

How to search the documents indexed into Elasticsearch using Kibana

Make a GET HTTP request to the Elasticsearch cluster to verify that the documents were indexed properly. This can be done in the Kibana Console UI by navigating to “Dev Tools” and making the following request:

1
GET some_index/_search

The results should resemble the following:

Screenshot of Elasticsearch documents constructed and indexed in Python in Kibana

Conclusion

This tutorial explained how to create a Python class construct for Elasticsearch documents. The article specifically covered how to create a Python script and import the low-level Elasticsearch client and create a client instance of the Python Elasticsearch library. The tutorial also covered how to declare a global string for the Elasticsearch index name, construct a class for the Elasticsearch documents in Python, how to create instances of the document class, how to instantiate the Elasticsearch documents and pass the documents to the client’s index() method and how to search the documents indexed into Elasticsearch using Kibana. Remember, in order to executor the examples in this tutorial, Python 3 and the PIP3 library used to install packages for Python 3 must be installed on the same machine.

Just the Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the built-in JSON library
import json

# import the random integer method
from random import randint

# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch

# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

# declare global string Elasticsearch index name
INDEX_NAME = "some_index"

# Document class for the Elaticsearch documents
class Document:
# class constructor for the Elasticsearch document
def __init__(self, index, id, source):
self.index = index
self.id = id
self.source = source

# returns empty dict as default
self.json = {}

try:
# Elasticsearch document structure as a Python dict
self.json = {
"_index": INDEX_NAME,
"_id": self.id,
"doc_type": "_doc",
"_source": self.source,
}
print ("Elasticsearch Document JSON:", self.json)

except Exception as error:
print ("Document JSON instance ERROR:", error)

# define a function that will construct a JSON string
def json_str(self):

# attempt to create a JSON string of the document using json.dumps()
try:
# use the 'indent' parameter with json.dumps() for more readable JSON
doc = json.dumps(self.source, indent=4)
except Exception as error:
doc = "{}"
print ("Document json_string() ERROR:", error)
return doc

# declare list of string values for documents
random_str = ["rocks!", "helps!", "support", "services", "articles"]

# index 10 Elasticsearch document with random variables
for i in range(10):

# concatenate string from random string in list for field value
ran_str = random_str[randint(0, len(random_str)-1)]

# _source data for the Elasticsearch document instance
doc_source = {
"string field": "Object Rocket " + ran_str,
# random integer for doc
"integer field": randint(1, 99999),
# randomly select 'true' or 'false'
"boolean field": [True, False][randint(0, 1)],
}

# instantiate a new Elasticsearch Document passing random _source data
doc = Document(
index = INDEX_NAME,
id = i, # _id is position in range iterator
source = doc_source # pass dict for doc _source data
)

# vars() argument must have __dict__ attribute
if hasattr(doc, "__dict__") == True:

# print the properties and attributes of the Document instance
print ("nNUM", i, "-->", vars(doc))

# print the attributes of Document instance
print (doc.json)
print ("ndoc.json_str():", doc.json_str())

# attempt to index the Elasticsearch document string
try:
resp = client.index(doc.index, body=doc.json_str(), id=doc.id)
print ("Document index() response:", resp)
print ("response TYPE:", type(resp), "n")
except Exception as error:
print ("client.index() ERROR:", error, "n")

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.