How to Create a Python Class Designed to Construct Elasticsearch Documents
Introduction
This tutorial will explain how to create a Python class construct for Elasticsearch documents. All variables and pieces of data are fundamentally “objects” in Python programming language. As all objects have attributes and properties, having a solid understanding of the OOP, or Object-Oriented programming, concept is essential to learning the Python fundamentals. Creating a class for the Elasticsearch documents aids in writing cleaner code and incorporates the OOP principles of encapsulation and inheritance. This tutorial will cover how to construct a simple Python class that will help users construct Elasticsearch JSON documents in a simple and reliable manner using Python object-oriented conventions.
Prerequisites for Creating a Python Class Construct for Elasticsearch Documents
Python 3 must be installed and working properly.
The PIP3 library used to install packages for Python 3 must be installed on the same machine.
Typically, the pip3
command is used by default. However, the library can also be installed from the APT repository on most versions of Debian and Ubuntu with the following command:
1 | sudo apt install python3-pip |
NOTE: Execute the pip3
command to install Elasticsearch to make API calls using the document instances demonstrated in this tutorial. Execute the following command to install the Elasticsearch client for Python:
1 | pip3 install elasticsearch |
How to Create a Python Script and Import the Low-Level Elasticsearch Client
The Python class constructor in this tutorial uses Python’s native JSON library. Import JSON and the Elasticsearch
class with the following script:
1 2 3 4 5 6 7 8 | # import the built-in JSON library import json # import the random integer method from random import randint # import the Elasticsearch low-level client library from elasticsearch import Elasticsearch |
NOTE: This tutorial uses the Python’s randint()
method to generate some example document data for the Elasticsearch client’s index()
API call.
How to create a client instance of the Python Elasticsearch library
Execute the following command to declare a client instance of the Elasticsearch client library, making sure to change the URL string to match the current cluster and domain settings:
1 2 | # declare a client instance of the Python Elasticsearch library client = Elasticsearch("http://localhost:9200") |
How to declare a global string for the Elasticsearch index name
Declare a string for the index name, as shown below, outside of any functions or class constructors so it will have a global scope:
1 2 | # declare global string Elasticsearch index name INDEX_NAME = "some_index" |
This will allow the variable to make API calls and be passed to the document instances.
How to Construct a Class for Elasticsearch Documents in Python
Use Python’s class
keyword to declare a class constructor for the Elasticsearch documents by executing the below script. Make sure to include all the necessary variables that must be passed as parameters to the initializer (__init__
) list.
1 2 3 4 5 6 7 | # Document class for the Elasticsearch documents class Document: # class constructor for the Elasticsearch document def __init__(self, index, id, source): self.index = index self.id = id self.source = source |
How to use the Python class constructor to build Elasticsearch JSON documents
The following script executes a dictionary constructor that allows the user to quickly build the appropriate name/value pairs for an Elasticsearch JSON document:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # returns empty dict as default self.json = {} try: # Elasticsearch document structure as a Python dict self.json = { "_index": INDEX_NAME, "_id": self.id, # doc_type deprecated as of v6 of Elasticsearch "doc_type": "_doc", "_source": self.source, } print ("Elasticsearch Document JSON:", self.json) except Exception as error: print ("Document JSON instance ERROR:", error) |
As shown in the following screenshot, this example uses a try-except block to catch any errors that may arise. This is designed to both catch malformed data that may attempt to pass to the "_source"
field and also log any errors.
How to give the Python class a method that will return a JSON string of the Elasticsearch document
Use Python’s def
keyword to declare a method for the Document
class. This will take the Elasticsearch JSON document, declared above, and return it as a JSON string that can then be passed to the Elasticsearch low-level client’s API calls as shown in the following script:
1 2 3 4 5 6 7 8 9 10 11 | # define a function that will construct a JSON string def json_str(self): # attempt to create a JSON string of the document using json.dumps() try: # use the 'indent' parameter with json.dumps() for readable JSON doc = json.dumps(self.source, indent=4) except Exception as error: doc = "{}" print ("Document json_string() ERROR:", error) return doc |
This method will also catch any errors that may arise while calling the JSON library’s dumps()
method and return an empty dict
string in such cases.
Use the indent
parameter comand to add spaces in the JSON string so it is more easily understood. Note that the number of spaces given each indentation is the integer value.
How to Create Instances of the Document Class for the Elasticsearch Documents
Many instances of the Document
class can be declared and used to index Elasticsearch documents. The following code is an example that utilizes Python’s random
library to generate some test data for the document instances:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # declare list of string values for documents random_str = ["rocks!", "helps!", "support", "services", "articles"] # index 10 Elasticsearch document with random variables for i in range(10): # concatenate string from random string in list for field value ran_str = random_str[randint(0, len(random_str)-1)] # _source data for the Elasticsearch document instance doc_source = { "string field": "Object Rocket " + ran_str, # random integer for doc "integer field": randint(1, 99999), # randomly select 'true' or 'false' "boolean field": [True, False][randint(0, 1)], } |
How to Instantiate the Elasticsearch Documents and Pass the Documents to the Client’s Index() Method
Execute the following script to declare an instance of the Document
class. Use the class’s json_str()
method to have it return a string of the document’s _source
data that can then be passed to the low-level client instance’s index()
method to index the document instance to the index name specified earlier:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # instantiate a new Elasticsearch Document passing random _source data doc = Document( index = INDEX_NAME, id = i, # _id is position in range iterator source = doc_source # pass dict for doc _source data ) # vars() argument must have __dict__ attribute if hasattr(doc, "__dict__") == True: # print the properties and attributes of the Document instance print ("nNUM", i, "-->", vars(doc)) # print the attributes of Document instance print (doc.json) print ("ndoc.json_str():", doc.json_str()) |
How to pass the document instance to the client’s index() method and get a response from the Elasticsearch cluster
The Elasticsearch cluster should return a dict
response showing what occurred with the API call. The response’s 'result'
key will indicate if the document was sucessfully indexed or updated:
1 2 3 4 5 6 | # attempt to index the Elasticsearch document string try: resp = client.index(doc.index, body=doc.json_str(), id=doc.id) print ("Document index() response:", resp, "n") except Exception as error: print ("client.index() ERROR:", error, "n") |
How to execute the Python script in a terminal window to index the Elasticsearch documents
Navigate to the directory containing the Python script and run it using the python3
command. An output of the document instances and the API call response from Elasticsearch should be visible in the window. The results should resemble the following:
How to search the documents indexed into Elasticsearch using Kibana
Make a GET
HTTP request to the Elasticsearch cluster to verify that the documents were indexed properly. This can be done in the Kibana Console UI by navigating to “Dev Tools” and making the following request:
1 | GET some_index/_search |
The results should resemble the following:
Conclusion
This tutorial explained how to create a Python class construct for Elasticsearch documents. The article specifically covered how to create a Python script and import the low-level Elasticsearch client and create a client instance of the Python Elasticsearch library. The tutorial also covered how to declare a global string for the Elasticsearch index name, construct a class for the Elasticsearch documents in Python, how to create instances of the document class, how to instantiate the Elasticsearch documents and pass the documents to the client’s index() method and how to search the documents indexed into Elasticsearch using Kibana. Remember, in order to executor the examples in this tutorial, Python 3 and the PIP3 library used to install packages for Python 3 must be installed on the same machine.
Just the Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | #!/usr/bin/env python3 #-*- coding: utf-8 -*- # import the built-in JSON library import json # import the random integer method from random import randint # import the Elasticsearch low-level client library from elasticsearch import Elasticsearch # declare a client instance of the Python Elasticsearch library client = Elasticsearch("http://localhost:9200") # declare global string Elasticsearch index name INDEX_NAME = "some_index" # Document class for the Elaticsearch documents class Document: # class constructor for the Elasticsearch document def __init__(self, index, id, source): self.index = index self.id = id self.source = source # returns empty dict as default self.json = {} try: # Elasticsearch document structure as a Python dict self.json = { "_index": INDEX_NAME, "_id": self.id, "doc_type": "_doc", "_source": self.source, } print ("Elasticsearch Document JSON:", self.json) except Exception as error: print ("Document JSON instance ERROR:", error) # define a function that will construct a JSON string def json_str(self): # attempt to create a JSON string of the document using json.dumps() try: # use the 'indent' parameter with json.dumps() for more readable JSON doc = json.dumps(self.source, indent=4) except Exception as error: doc = "{}" print ("Document json_string() ERROR:", error) return doc # declare list of string values for documents random_str = ["rocks!", "helps!", "support", "services", "articles"] # index 10 Elasticsearch document with random variables for i in range(10): # concatenate string from random string in list for field value ran_str = random_str[randint(0, len(random_str)-1)] # _source data for the Elasticsearch document instance doc_source = { "string field": "Object Rocket " + ran_str, # random integer for doc "integer field": randint(1, 99999), # randomly select 'true' or 'false' "boolean field": [True, False][randint(0, 1)], } # instantiate a new Elasticsearch Document passing random _source data doc = Document( index = INDEX_NAME, id = i, # _id is position in range iterator source = doc_source # pass dict for doc _source data ) # vars() argument must have __dict__ attribute if hasattr(doc, "__dict__") == True: # print the properties and attributes of the Document instance print ("nNUM", i, "-->", vars(doc)) # print the attributes of Document instance print (doc.json) print ("ndoc.json_str():", doc.json_str()) # attempt to index the Elasticsearch document string try: resp = client.index(doc.index, body=doc.json_str(), id=doc.id) print ("Document index() response:", resp) print ("response TYPE:", type(resp), "n") except Exception as error: print ("client.index() ERROR:", error, "n") |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started