Guide How To Create Test Data In Python And Convert It Into An Elasticsearch JSON File

Written by Data Pilot

April 08, 2019

Elasticsearch
JSON
Python

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

If you have a large dataset that is stored as a nested dictionary in Python code, and you want to be able to use it in Elasticsearch, it must be formatted for compatibility. This tutorial will show the method to convert that Python data into a JSON file that the Elastic search engine can use. The information contained here will explain every step in the process and provide valuable links to more documentation on related topics. This tutorial is also very useful if you want/need to learn how to generate random test data in the Python language and then use it with the Elastic Stack.

Prerequisites:

This article assumes the user is on a UNIX-based machine, like macOS or Linux, but the Python code will work on Windows machines as well.
It’s recommended that you use Python 3 since Python 2 is depreciating by 2020.
Get the latest stable release of Python 3.

The Python 3 commands for python3 and the pip3 package manager each respectively follow this format (ending with a 3) :

1 2	python3 some_script.py pip3 install examplemodule

Screenshot of a UNIX terminal getting the Python3 and pip3 Package Manager Versions

The Elastic Stack needs to be installed and running in order to import data into the target cluster. Be sure to have an existing index created on your cluster that you can put the Python data into.

Have SSH access to your server with a private key

Structure of a Python Nested Dictionary:

A Python nested dictionary is essentially a dictionary inside of a dictionary

The json Python module takes a dictionary and converts it into a JSON file:

Python's JSON Dumps Example

Setup the Python Script:

Create a new directory for the Python app:

1 2	mkdir python2elastic cd python2elastic

If you haven’t created the Python script yet, you can use touch to create one:

1	sudo touch elastic_json.py

Edit the Python Script:

Use nano to edit the script if you’re using a UNIX terminal.
Once you’re inside the file, import the json Python module that was included with “The Python Standard Library”:

1	import json

Importing Python Modules:

Import the uuid and random libraries so that you can generate test data using random UUID’s for the Elasticsearch documents’ keys:

1	import uuid, random

Other modules to import as well:

1	import os, pickle

Create a Randomized Python Dictionary:

If you want to start with creating some test data, you can use this code to randomly generated values and then stick them into a nested Python dictionary:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

# create an empty dictionary
test_data = {}

# specify the number of keys (or 'documents') to create
total_documents = 10

print ("\n\nCreating " + str(total_documents) + " Elasticsearch documents:")
for key in range(total_documents):

# give each pet document a UUID
new_uuid = uuid.uuid4()

# convert UUID to string
new_uuid = str(new_uuid)

# create a nest dictionary which represents the Elasticsearch document
test_data[new_uuid] = {}

# list for pet's name

names = ['Bailey', 'Bear', 'Buddy', 'Charlie', 'Coco', 'Cosmo', 'Daisy', 'Lily', 'Lola', 'Maggy', 'Mango', 'Molly', 'Sadie', 'Sophie', 'Sunny', 'Tiger', 'Toby']

# randomly select a name
name = names[random.randint(0, len(names)-1)]

# randomly select a dog or cat
species = ['dog', 'cat'][random.randint(0, 1)]

# randomly select the sex of the pet
sex = ['male', 'female'][random.randint(0, 1)]

# NOTE: The value for an Elasticsearch JSON document's field MUST
# be a string, regardless of the actual data type

age = str(random.randint(0, 10)) # randomly select the age of the pet
# put all of the values into the dictionary

test_data[new_uuid]['uuid'] = new_uuid
test_data[new_uuid]['name'] = name
test_data[new_uuid]['species'] = species
test_data[new_uuid]['sex'] = sex
test_data[new_uuid]['age'] = age

# print the UUID and name of the JSON document
print (name + " has UUID: " + new_uuid)

Serialize the Python Dictionary with Pickle for Later Use:

Optionally, you can also serialize (using pickle.dump) the randomized data if you want to store it for later use:

1
2
3

# serialize the data
with open('pets.pickle', 'wb') as handle:
pickle.dump(test_data, handle, protocol=pickle.HIGHEST_PROTOCOL)

If you don’t want to create any test data, but you already have data serialized, pickle also has a load method to access it:

1 2	with open('pets.pickle', 'rb') as handle: test_data = pickle.load(handle)

Create the Elasticsearch Documents from the Dictionary:

Copy the dictionary’s keys and their respective values. Then from those values, create a new JSON string that is compatible with an Elasticsearch JSON file. Next, you can use the Python json library’s built-in method called dumps to convert any Python dictionaries into JSON objects. This code will duplicate the dictionary values and make a new string to represent the Elasticsearch document. Finally, all of the documents will be put into an empty list like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

# create an empty list for the JSON documents
document_list = []

# iterate over the keys and values of the nested dictionary
for key, val in test_data.items():
# create a pet variable to represent the Elasticsearch document
pet = test_data[key]

# print a message about the pet
print ("Creating a JSON object for: " + pet['name'])

'''
Elasticsearch requires this 'index' field for every document. Each linebreak and "index" represents
a single JSON document. Use '\n' for linebreaks:
'''
json_string = '{"index":{}}\n' + json.dumps(pet) + '\n'
print ("New Elasticsearch document:" + json_string)

# append the document string to the list
document_list += [json_string]

'''
Elasticsearch requires this 'index' field for every document. Each linebreak and "index" represents
a single JSON document. Use '\n' for linebreaks:
'''
json_string = '{"index":{}}\n' + json.dumps(pet) + '\n'
print ("New Elasticsearch document:" + json_string)

# append the document string to the list
document_list += [json_string]

Copy the Document List and Append Each Line to a JSON File:

The list containing all the document strings is now ready to be put into a JSON document:

1
2
3
4

# Copy the list and write each document to a JSON file called "data.json"
for doc in document_list:
with open("data.json", 'a') as log:
log.write(doc)

Add the JSON Documents to an Elasticsearch Index:

Save the Python script and execute it in a terminal with python3:

1	python3 elastic_json.py

Execute Python Script using `python3`

Push the `.json` file to your server:

Use scp command in your terminal to push the JSON file to your server using a private key, but (make sure you replace {SERVER_IP} with the correct web server’s IP address:

1	sudo scp -i /path/to/private/key data.json root@{SERVER_IP}:/some/directory/on/server

Use the Elasticsearch `_bulk_` API to Push the Data to an Index:

Use a PUT cURL request to put the JSON data you created with our Python script into an Elasticsearch index:

1	curl -s -XPUT 'localhost:9200/pets/pets/_bulk?pretty' -H 'Content-Type: application/json' --data-binary @data.json

Voilà! Using the Kibana Console UI we can verify that the data is indeed in the index. There are now 10 documents in the index:

The 'Pets' Index as Viewed in the Kibana Console UI

Conclusion

Once you have finished following the procedure explained here, you will have a much better understanding of how to manipulate your Python files for Elastic use. This opens up a whole new resource for adding to your Elasticsearched database network. In addition, it also creates a connection to a host of other programs that are already producing data in Python. Thus, this new ability enables you to begin constructing a more functional network across multiple datatypes, which can all be rapidly accessed with the Elasticsearch service. To see what else can be done with this remarkable platform, look at our other operations and installations we’ve made walkthroughs for.

Just the Code

NOTE: The complete Python script to create randomized JSON objects to use as Finished Elasticsearch Documents Python Script:

If all you need is the technical stuff, we have the isolated script in its entirety, for your convenience below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

#!/usr/bin/env python3
#-*- coding: utf-8 -*-
import json, uuid, random
import os, pickle

# create an empty dictionary
test_data = {}
# specify the number of keys (or 'documents') to create
total_documents = 10
print ("\n\nCreating " + str(total_documents) + " Elasticsearch documents:")
for key in range(total_documents):
# give each pet document a UUID
new_uuid = uuid.uuid4()

# convert UUID to string
new_uuid = str(new_uuid)

# create a nest dictionary which represents the Elasticsearch document
test_data[new_uuid] = {}

# list for pet's name
names = ['Bailey', 'Bear', 'Buddy', 'Charlie', 'Coco', 'Cosmo', 'Daisy', 'Lily', 'Lola', 'Maggy', 'Mango', 'Molly', 'Sadie', 'Sophie', 'Sunny', 'Tiger', 'Toby']
# randomly select a name
name = names[random.randint(0, len(names)-1)]

# randomly select a dog or cat
species = ['dog', 'cat'][random.randint(0, 1)]

# randomly select the sex of the pet
sex = ['male', 'female'][random.randint(0, 1)]

# NOTE: The value for an Elasticsearch JSON document's field MUST
# be a string, regardless of the actual data type
age = str(random.randint(0, 10)) # randomly select the age of the pet

# put all of the values into the dictionary
test_data[new_uuid]['uuid'] = new_uuid
test_data[new_uuid]['name'] = name
test_data[new_uuid]['species'] = species
test_data[new_uuid]['sex'] = sex
test_data[new_uuid]['age'] = age

# print the UUID and name of the JSON document
print (name + " has UUID: " + new_uuid)

# serialize the data
with open('pets.pickle', 'wb') as handle:
pickle.dump(test_data, handle, protocol=pickle.HIGHEST_PROTOCOL)

'''
with open('pets.pickle', 'rb') as handle:
test_data = pickle.load(handle)
'''

# create an empty list for the JSON documents
document_list = []

# iterate over the keys and values of the nested dictionary
for key, val in test_data.items():
# create a pet variable to represent the Elasticsearch document
pet = test_data[key]

# print a message about the pet
print ("Creating a JSON object for: " + pet['name'])

'''
Elasticsearch requires this 'index' field for every document. Each linebreak and "index" represents
a single JSON document. Use '\n' for linebreaks:
'''
json_string = '{"index":{}}\n' + json.dumps(pet) + '\n'
print ("New Elasticsearch document:" + json_string)

# append the document string to the list
document_list += [json_string]

# iterate over the list and write each document to a JSON file called "data.json"
for doc in document_list:
with open("data.json", 'a') as log:
log.write(doc)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Guide How To Create Test Data In Python And Convert It Into An Elasticsearch JSON File

Introduction

Prerequisites:

Structure of a Python Nested Dictionary:

Setup the Python Script:

Edit the Python Script:

Importing Python Modules:

Create a Randomized Python Dictionary:

Serialize the Python Dictionary with Pickle for Later Use:

Create the Elasticsearch Documents from the Dictionary:

Copy the Document List and Append Each Line to a JSON File:

Add the JSON Documents to an Elasticsearch Index:

Push the `.json` file to your server:

Use the Elasticsearch `_bulk_` API to Push the Data to an Index:

Conclusion

Just the Code

Pilot the ObjectRocket Platform Free!

Keep in the know!

Services

Platform

Company

Resources

Support

Guide How To Create Test Data In Python And Convert It Into An Elasticsearch JSON File

Introduction

Prerequisites:

Structure of a Python Nested Dictionary:

Setup the Python Script:

Edit the Python Script:

Importing Python Modules:

Create a Randomized Python Dictionary:

Serialize the Python Dictionary with Pickle for Later Use:

Create the Elasticsearch Documents from the Dictionary:

Copy the Document List and Append Each Line to a JSON File:

Add the JSON Documents to an Elasticsearch Index:

Push the .json file to your server:

Use the Elasticsearch _bulk_ API to Push the Data to an Index:

Conclusion

Just the Code

Pilot the ObjectRocket Platform Free!

Keep in the know!

Services

Platform

Company

Resources

Support

Push the `.json` file to your server:

Use the Elasticsearch `_bulk_` API to Push the Data to an Index: