Migrate Elasticsearch to CockroachDB

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

In this instructional article, we will migrate Elasticsearch to CockroachDB with Python codeing. We will hold your hand each step of the way, starting out with the libraries you will have to install and reference in your Python scripts in order to open a Elasticsearch database and read from it, as well as the libraries needed to open and write to your CockroachDB data.

Prerequisites

We will assume you have a basic understanding of the following:

  • Python codeing.
  • Elasticsearch and NoSQL. It’s okay if you are a total newbie to Elasticsearch and NoSQL. If you follow along below, you will learn enough to at least increase your understanding today of how it all works.
  • CockroachDB.
  • SQL.

In the case that you are new to Elasticsearch and/or the NoSQL “structure”: Elasticsearch is all about documents, meaning that it stores data as “documents” and indexes that data automatically so all of the information is easily and briefly searchable. With Elasticsearch you can easily index, search, sort, and filter documents. This is a primary strength of NoSQL type databases.

Some Elasticsearch terminology that is important to know because it can be different from what we learned in the SQL / Relational Database model: In ElasticSearch, an index is like a database in SQL databases like CockroachDB and Postgres. It is a place to store related documents. To retrieve data, we need three pieces of information:

  • Index: Database.
  • Datatype: Type of NoSQL document.
  • Id: ID of the document.

First, we will examine a test database – in Elasticsearch – that we want to migrate the data from, into CockroachDB.

Elasticsearch structure and data

Here are some rows (known in NoSQL as “documents”) from the users “table” – known in NoSQL as a “collection” – from our Elasticsearch database that is named “noSQLDB”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"_id" : ObjectId("1a354b3c12f92b28b8d65f"),
"kind" : "user",
"name" : "Jim Thance",
"phone" : "511-225-3248",
"age" : "31"

"_id" : ObjectId("93d83a8a8ab6558b8d3b5"),
"kind" : "user",
"name" : "Martin Starc",
"phone" : "511-535-6157",
"age" : "52"

"_id" : ObjectId("3f744a2c21965e8b8d64f"),
"kind" : "user",
"name" : "Shoret Brieferson",
"phone" : "511-413-6219",
"age" : "18"

IMPORTANT NOTE: For your ease, we have given the above Elasticsearch JSON index a consistent structure, which you can not always count on in your NoSQL databases.

Python code for migration

For our scripts to access that NoSQL data using Python, we will have to reference the pyElasticsearch library and Elasticsearch.

1
2
3
4
5
6
7
from flask import Flask
from flask import render_template # used later to send user to error page.
import psycopg2 # for Cockroach database connection
import elasticsearch
from elasticsearch import Elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl import Search

CREDIT TO: we have used Elasticsearch PY, which can be found through https://pypi.org.

Next, let’s set up the Elasticsearch database login, credentials, etc. and set up a connection to the “noSQLDB” Elasticsearch database.

1
2
3
4
5
6
7
8
9
10
11
txtHostURL = "localhost"
txtPortNum = "9120"
txtDBnameES = "noSQLDB" # "index" in Elasticsearch terminology
t_user = "Elasticsearchadmin"
t_pw = "[Elasticsearch database user password here]"
clientES = Elasticsearch
    (
    [txtHostURL],
    http_auth=(t_user, t_pw),
    port=txtPortNum
    )

Analysis

Looking above, you can see we created an object called “clientES” that we will use later to send a data request to Elasticsearch.

Below, we set up CockroachDB credentials and create a connection to the SQL database we will use to write to later.

1
2
3
4
5
6
7
8
9
10
txtDBname = "db name"
txtNameUser = "db user name"
txtSSLmode = "auto"
txtSSLrootCert = 'certifs/ca.crt'
t_sslkey = 'certifs/client.maxroach.key'
t_sslcert = 'certifs/client.maxroach.crt'
txtHostURL = "localhost"
txtPortNum = "26253"
connCRDB = psycopg2.connect(database=txtDBname, user=txtNameUser, sslmode=txtSSLmode, sslrootcert=txtSSLrootCert, sslkey=t_sslkey, sslcert=t_sslcert, host=txtHostURL, port=txtPortNum)
cursorCRDB = connCRDB.cursor()

Analysis

We created the “cursorCRDB” object for later use in executing an SQL command on the Cockroach database.

Next we will set up two arrays, one for user names and one for user ages.

1
2
arrUsersNames = []
arrUsersAges = []

Then we get all documents (rows) from users collection (table) where the “kind” field’s value is “user”.

1
2
s = Search(index=txtDBnameES).using(clientES).query("match", kind="user")
Elasticsearch_results = s.execute()

Next: Read “name” and “age” from all three of those returned documents into the Python array we had prepared.

1
2
3
for row in Elasticsearch_results:
    arrUsersNames.append(row.name)
    arrUsersAges.append(row.age)

Analysis

  • “for…”: This is a FOR loop to cycle through the rows in the “Elasticsearch_results” collection.
  • “append(row[‘name’])”: Add current item to the user names array for later insertion into our Cockroach database.
  • “append(row[‘age’])”: Add current item to the ages array for insertion into our CockroachDB users table.

Next, we build the SQL to later execute on our Cockroach database using the cursor we created earlier.

1
2
3
4
5
6
7
8
9
10
11
s = ""
s += "INSERT INTO tblUsers "
s += "("
s += "txtName"
s += ", intAge"
s += ")"
s += " VALUES"
s += "("
s += "'" + arrUsersNames + "'"
s += "," + arrUsersAges
s += ")"

Analysis

  • “INSERT INTO tblUsers…”: This code adds a record to the “tblUsers” in Cockroach.
  • “txtName” and “intAge”: We’re setting up CockroachDB to receive the two sets of data and telling CockroachDB where to put that data.
  • “VALUES”: This tells CockroachDB to insert the values that come next into “tblUsers”.
  • “arrUsersNames” and “arrUsersAges”: Plugs in the actual values to insert, which in this case are the two arrays we filled from the Elasticsearch source earlier.

Finally, scripts execution, error checking, object/connection housekeeping, and redirect.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Here we catch and display any errors that might occur
#   while TRYing to commit the execute of our SQL code.
cursorCRDB.execute(s)
try:
    connCRDB.commit()
except psycopg2.Error as e:
    # Build a message to send to the user with potentially useful data, including
    #    the error message and the SQL code that caused the error.
    txtMessage = "Database error: " + e + "/n SQL: " + s
    # Notice: you may wish to build a template page called "errors.html" for dealing with errors.
    #   We are also sending a parameter value along with our call to the HTML error display page.
    return render_template("errors.html", message = txtMessage)

# They made it this far, meaning a success.
# Clean up our Cockroach database connection and cursor object.
cursorCRDB.close()
connCRDB.close()

# Redirect the user to the rest of your Python application:
return redirect("https://your-site-address-here", code=302)

Source code

Here’s the final version of your Python scripts! IMPORTANT: When we say “learn about the scripts below,” this is important for more reasons than one. For example, you may have to install some new Python libraries, namely “flask”, “psycopg2”, “pyElasticsearch”, and “Elasticsearch”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
from flask import Flask
from flask import render_template # used later to send user to error page.
import psycopg2 # for Cockroach database connection
import elasticsearch
from elasticsearch import Elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl import Search

app = Flask(__name__)

# Elasticsearch Database credentials
txtHostURL = "[localhost]"
txtPortNum = "9400"
txtDBnameES = "noSQLDB" # "index" in Elasticsearch terminology
t_user = "Elasticsearchadmin"
t_pw = "[Elasticsearch database user password here]"
clientES = Elasticsearch
    (
    [txtHostURL],
    http_auth=(t_user, t_pw),
    port=txtPortNum
    )

# Cockroach Database credentials
txtDBname = "db name"
txtNameUser = "db user name"
txtSSLmode = "auto"
txtSSLrootCert = 'certifs/ca.crt'
t_sslkey = 'certifs/client.maxroach.key'
t_sslcert = 'certifs/client.maxroach.crt'
txtHostURL = "localhost"
txtPortNum = "26351"
connCRDB = psycopg2.connect(database=txtDBname, user=txtNameUser, sslmode=txtSSLmode, sslrootcert=txtSSLrootCert, sslkey=t_sslkey, sslcert=t_sslcert, host=txtHostURL, port=txtPortNum)
cursorCRDB = connCRDB.cursor()

def migrate():
    # Build arrayRecords here from Elasticsearch users collection
    arrUsersNames = []
    arrUsersAges = []

    # Pull all documents (rows) from users collection (table)
    s = Search(index=txtDBnameES).using(clientES).query("match", kind="user")
    Elasticsearch_results = s.execute()

    for row in Elasticsearch_results:
        arrUsersNames.append(row.name)
        arrUsersAges.append(row.age)

    # Data validation could go here.

    # Destroy the database collection connection
    Elasticsearch_results.close()

    # Send the data in our arrays to the Cockroach database
    s = ""
    s += "INSERT INTO tblUsers "
    s += "("
    s += "txtName"
    s += ", intAge"
    s += ")"
    s += " VALUES"
    s += "("
    s += "'(%arrUsersNames)'"
    s += ",(%arrUsersAges)
    s += "
)"

    # Here we are catching and displaying any errors that occur
    #   while TRYing to commit the execute our SQL code.
    cursorCRDB.execute(s, [arrUsersNames, arrUsersAges])
        try:
            connCRDB.commit()
        except psycopg2.Error as e:
            # Create a message to send to the user with valuable data, including
            #    the error message and the SQL that caused the error.
            txtMessage = "
Database error: " + e + "/n SQL: " + s
            # Notice: you may want to create a template page called "
errors.html" for handling errors.
            #   We are also sending a parameter along with our call to the page.
            return render_template("
errors.html", message = txtMessage)

        # They got this far, meaning they were successful.
        # Clean up our Cockroach database connection and cursor object
        cursorCRDB.close()
        connCRDB.close()

    # Redirect user to the rest of your application
    return redirect("
https://your-site-address-here", code=302)

if __name__ == "
__migrate__":
    migrate()

Miscellaneous

To keep this tutorial document as simple as possible, we chose to leave out data validation, meaning we recommend you check the data pulled from the Elasticsearch before attempting to insert it into your Cockroach database. This is particularly important when dealing with “semi-structured” Elasticsearch NoSQL databases where you can’t 100% count on consistency.

Conclusion

In this tutorial document, we investigated and learned how to Migrate Elasticsearch NoSQL into a Cockroach database using Python. We showed script samples along the way and explained in detail how to do and why we did each step. We even took a brief look some JSON. As part of the process, we used Cockroach’s INSERT statement. We also used Python Arrays and the For loop type.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.