Migrate MongoDB NoSQL to Postgres with Python

Introduction

In this tutorial, we will migrate some MongoDB NoSQL to Postgres with Python scripting. We’ll hold your hand each step of the way, beginning with the libraries you will need to install and reference in your Python code in order to open a Mongo database and read from it, as well as the libraries needed to open and write to your PostgreSQL data.

Assumptions and prerequisites

We’ll assume you have a basic understanding of the following:

  • Python coding language.
  • MongoDB.
  • PostgreSQL.
  • Optional: Tutorial on naming conventions that explains why we prefix our variables, column names, table names, etc. (all objects) as you see us doing here. For example, naming “tphrase_original” with the “t” you see at the beginning in order to delineate it as a “text” object and “tbl_” before table names in order to clearly mark those objects as tables.

To begin, let’s take a look at the test database – in JSON – that we want to migrate over to PostgreSQL.

MongoDB structure and data

Here are some documents (records/rows) from the users collection (table) from our Mongo database called “testDB”.

"_id" : ObjectId("2c684a3c12f92b28b8d39a"),
"name" : "Mick Frankson",
"phone" : "512-211-3288",
"age" : "25"

"_id" : ObjectId("62b83b8a8ab6558b8d373"),
"name" : "Sue Menow",
"phone" : "512-225-6777",
"age" : "17"

"_id" : ObjectId("44717f2c21965e8b8d5d2"),
"name" : "Jim Punjacks",
"phone" : "512-391-5445",
"age" : "39"

Please note: For your ease, we have given the above Mongo JSON-formatted collection a consistent structure, which you can not always count on in a NoSQL database.

Start writing our Python

In order to read that data from Python, we’ll reference the pymongo library and MongoClient.

import pymongo
from pymongo import MongoClient

Next, let’s set up our Mongo database credentials and create a connection to the database.

t_host = "localhost"
t_port = "27017" # Mongo port (double-check to make sure this is the port your Mongo database is running on)
t_dbname = "testDB"
t_user = "mongoadmin"
t_pw = "[Mongo database user password here]"
client = MongoClient("mongodb://" + t_user + ":" + t_pw + "@" + t_host + ":" + t_port + "/" + t_dbname)
db_mongo = client.testDB
users_mongo = db_mongo.users

Analysis

Notice, here we have created an object called “users_mongo” for referencing the “users” collection. “Collections” in NoSQL are like tables in SQL. Later we will cycle through that table, looking at two fields, “name” and “age”.

Now we set up PostgreSQL credentials and create a connection to the SQL database we will use to write to later.

t_host = "[database address here]"
t_port = "5432" # Default postgres port
t_dbname = "[database name here]"
t_user = "[database user name here]"
t_pw = "[database user password here]"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()

Analysis

Above, we created the “db_cursor” object for later use in executing an SQL command.

Next we’ll set up two arrays, one for user names and one for user age.

array_users_name = []
array_users_age = []

Then we read all three documents (records) that exist in the users collection from Mongo into a Python array.

    mongo_results = users_mongo.find()
    for record in mongo_results:
        array_users_name.append(record['name'])
        array_users_age.append(record['age'])

Analysis

  • “find()”: With no parameters, this function pulls all records (documents) into a Python collection we are calling “mongo_results”.
  • “for…”: This is our loop to move through the 3 records in the “mongo_results” collection.
  • “append(record[‘name’])”: Adding an item to the array we use to store user names for later insertion into our Postgres database.
  • “append(record[‘age’])”: Adding an item to the array we use to store user ages for insertion into our PostgreSQL users table.

Next, we build the SQL we’ll end up sending to Postgres via the cursor we created earlier.

    s = ""
    s += "INSERT INTO tbl_users "
    s += "("
    s += "t_name"
    s += ", i_age"
    s += ")"
    s += " VALUES"
    s += "("
    s += "'" + array_users_name + "'"
    s += "," + array_users_age
    s += ")"

Analysis

  • “INSERT INTO tbl_users…”: This will add a row to tbl_users.
  • “t_name” and “i_age”: We’re preparing PostgreSQL to receive two columns of data and telling it where to put them.
  • “VALUES (“: This tells the Postgres database to insert the values that follow into “tbl_users”.
  • “array_users_name” and “array_users_age”: Supplies the actual values to insert, which in this case are the two arrays we filled from the MongoDB source earlier.

Finally, code execution, error checking, cleanup, and redirect.

    # Here we are catching and displaying any errors that occur
    #   while TRYing to commit the execute our SQL script.
    db_cursor.execute(s)
        try:
            db_conn.commit()
        except psycopg2.Error as e:
            # Create a message to send to the user with valuable information, including
            #    the error message and the SQL that caused the error.
            t_message = "Database error: " + e + "/n SQL: " + s
            # Notice: you may want to create a template page called "error.html" for handling errors.
            #   We are also sending a parameter along with our call to the page.
            return render_template("error.html", message = t_message)

        # They got this far, meaning success
        # Clean up our Postgres database connection and cursor object
        db_cursor.close()
        db_conn.close()

    # Redirect user to the rest of your application
    return redirect("http://your-URL-here", code=302)

The code

Here’s the final version of your Python code! IMPORTANT: When we say “study the code below,” this is important for more reasons than one. For example, you may need to install some new Python libraries, namely “flask”, “psycopg2”, “pymongo”, and “MongoClient”.

from flask import Flask
from flask import render_template # used later to send user to error page.
import psycopg2 # for Postgres database connection
import pymongo # for MongoDB database connection
from pymongo import MongoClient

app = Flask(__name__)

# Mongo Database credentials
t_host = "localhost"
t_port = "27017" # Mongo port
t_dbname = "testDB"
t_user = "mongoadmin"
t_pw = "[Mongo database user password here]"
client = MongoClient("mongodb://" + t_user + ":" + t_pw + "@" + t_host + ":" + t_port + "/" + t_dbname)
db_mongo = client.testDB
users_mongo = db_mongo.users

# Postgres Database credentials
t_host = "[database address here]"
t_port = "5432" # Default postgres port
t_dbname = "[database name here]"
t_user = "[database user name here]"
t_pw = "[database user password here]"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()

def migrate():
    # Build array_rows here from Mongo users collection
    array_users_name = []
    array_users_age = []

    # Pull all documents (records) from users collection (table)
    mongo_results = users_mongo.find()
    for record in mongo_results:
        array_users_name.append(record['name'])
        array_users_age.append(record['age'])

    # Data validation would go here.

    # Cleanup our database collection connection
    mongo_results.close()

    # SEND rows to the Postgres database
    s = ""
    s += "INSERT INTO tbl_users "
    s += "("
    s += "t_name"
    s += ", i_age"
    s += ")"
    s += " VALUES"
    s += "("
    s += "'" + array_users_name + "'"
    s += "," + array_users_age
    s += ")"
    # IMPORTANT WARNING: this format allows for a user to try to insert
    #   potentially damaging code, commonly known as "SQL injection".
    #   You will want to investigate methods for preventing this, including
    #   use of stored procedures.

    # Here we are catching and displaying any errors that occur
    #   while TRYing to commit the execution of our SQL script.
    db_cursor.execute(s)
        try:
            db_conn.commit()
        except psycopg2.Error as e:
            # Create a message to send to the user with valuable information, including
            #    the error message and the SQL that caused the error.
            t_message = "Database error: " + e + "/n SQL: " + s
            # Notice: you may want to create a template page called "error.html" for handling errors.
            #   We are also sending a parameter along with our call to the page.
            return render_template("error.html", message = t_message)

        # They got this far, meaning success
        # Clean up our Postgres database connection and cursor object
        db_cursor.close()
        db_conn.close()

    # Redirect user to the rest of your application
    return redirect("http://your-URL-here", code=302)

if __name__ == "__migrate__":
    migrate()

Miscellaneous

In the interest of keeping it simple, we left out data validation. Meaning, we recommend you validate the data pulled from the MongoDB before attempting to insert it into your Postgres database. This is expecially important when dealing with “structureless” NoSQL database systems where you can not count on consistency as much as you can with SQL databases. So please be aware of this and do your NULL checking!

Conclusion

In this tutorial, we walked you through learning how to Migrate MongoDB NoSQL to Postgres using Python. We showed code samples along the way and explained in detail how and why to do each step. We even had some exposure to JSON. For part of this article, we also used the PostgreSQL INSERT statement. In addition, we used Python Arrays and the For loop.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.