Insert JSON Data into PostgreSQL using Python (Part 2)

Introduction

This is part two of an article series that demonstrates how to load JSON data from a file and convert it into Python objects that can be parsed and formatted into an SQL-compliant string that, when executed, will insert record data into a PostgreSQL table. In this article you’ll learn how to use Python to open a .json file in order to read and parse its data in order to create the SQL string.

Prerequisites

You will need a stable version of Python 3 installed (the code in this article has not been tested on Python 2.7), and you’ll need to install the psycopg2 Python adapter for Postgres as well if you intend on executing the final SQL statement.

Execute the following bash command in a terminal or command prompt window to instruct PIP to install the adapter:

1
pip3 install psycopg2

In the last article of this series we created some JSON data that the code can use to parse and format. Make sure you have some JSON objects in an array that can be used as Postgres table data before attempting to use the examples found in this article.

Use Python to parse JSON

In part one we created a Python script and imported the following libraries to help read, parse, and insert some JSON data into a Postgres table:

1
2
3
4
5
# import Python's built-in JSON library
import json, sys

# import the psycopg2 database adapter for PostgreSQL
from psycopg2 import connect, Error

Use Python to read a JSON file

The following example code will open a .json file and render the data as Python objects using the json.load() method call with open():

1
2
3
4
5
# use Python's open() function to load the JSON data
with open('postgres-records.json') as json_data:

    # use load() rather than loads() for JSON files
    record_list = json.load(json_data)

NOTE: To convert a JSON-compliant string into Python objects use the json.loads() method instead. The “s” at the end of loads stands for “string”.

Use the following code to evaluate the object returned by json.load() to verify that it’s a list and let’s grab the first potential Postgres “record”:

1
2
if type(record_list) == list:
    first_record = record_list[0]

Now let’s use the Python dictionary’s keys() method to retrieve its “column” names in the form of a list:

1
2
3
    # get the column names from the first record
    columns = list(first_record.keys())
    print ("\ncolumn names:", columns)

You’ll need that column data later when concatenating the INSERT INTO SQL string. Here’s another way you can get the column names using a one-liner:

1
columns = [list(x.keys()) for x in record_list][0]

Declare an SQL string for Postgres records

Let’s start by declaring the string for the SQL statement. The following code will declare the string and put the table name and joined column names into it:

1
2
3
table_name = "json_data"
sql_string = 'INSERT INTO {} '.format( table_name )
sql_string += "(" + ', '.join(columns) + ")\nVALUES "

NOTE: Make sure that your Postgres table name doesn’t use any spaces or hyphens.

The above example code will insert a table name value into the string using Python’s format() function, but be aware that this only works in Python 3.

Use Python to parse JSON

This next part is the “meat and potatoes” of the script because it’s the part of the code that will parse the Python dict objects, containing your Postgres record data, and it will format and append them to the string in a manner that will keep it SQL-compliant.

Using Python to parse a JSON object

This first example enumerates over the list of dictionaries (Postgres records) and each of their respective values, and then casts each value as a SQL-compliant string before joining them together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# enumerate over the record
for i, record_dict in enumerate(record_list):

    # iterate over the values of each record dict object
    values = []
    for col_names, val in record_dict.items():

        # Postgres strings must be enclosed with single quotes
        if type(val) == str:
            # escape apostrophies with two single quotations
            val = val.replace("'", "''")
            val = "'" + val + "'"

        values += [ str(val) ]

The above code evaluates each piece of record value and it will format it properly if it’s a string before appending it to a new “values” list.

The last step is to enclose each record’s values in parenthesis, after joining them together, and finally append a semicolon at the end of the SQL statement:

1
2
3
4
5
    # join the list of values and enclose record in parenthesis
    sql_string += "(" + ', '.join(values) + "),\n"

# remove the last comma and end statement with a semicolon
sql_string = sql_string[:-2] + ";"

Another Python and JSON parser example

You can also just put all of the values into a nested Python list using this one-liner:

1
2
# create a nested list of the records' values
values = [list(x.values()) for x in record_list]

After that you just declare an empty string for the record values and then enumerate over the record values while you pass strings to psycopg2’s Json library so that you can escape single quotes or apostrophes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# value string for the SQL string
values_str = ""

# enumerate over the records' values
for i, record in enumerate(values):

    # declare empty list for values
    val_list = []
   
    # append each value to a new list of values
    for v, val in enumerate(record):
        if type(val) == str:
            val = str(Json(val)).replace('"', '')
        val_list += [ str(val) ]

    # put parenthesis around each record string
    values_str += "(" + ', '.join( val_list ) + "),\n"

# remove the last comma and end SQL with a semicolon
values_str = values_str[:-2] + ";"

NOTE: Since version 2.5.4 of the psycopg2 library there’s now better JSON support with the implementation of its Json adapter. In a terminal or command prompt tab you can use the pip3 list command to have PIP return all of its installed Python packages so that you can view their respective version numbers.

The above code will append each joined value list to the values_str object which you can combine with the column and table name to complete the SQL statement using the following string concatenation:

1
2
3
4
5
6
7
# concatenate the SQL string
table_name = "json_data"
sql_string = "INSERT INTO %s (%s)\nVALUES %s" % (
    table_name,
    ', '.join(columns),
    values_str
)

The above code uses the %c placeholder to insert values into the SQL string including the table name and column names.

Insert JSON data into PostgreSQL

If you just want to output the results to the terminal use Python’s print function to see how the string turned out. Put the following code in the end of your script to have Python output the results of the concatenated SQL string to a terminal window:

1
2
print ("\nSQL statement:")
print (sql_string)

psycopg2 insert example

This next part of the Python code is optional since you may want to inspect or alter the SQL string output before having psycopg2 insert it.

Code for psycopg2 to connect to Postgres

If you want to execute the SQL string in the script instead you can append the following example code that uses the psycopg2 library to connect to a PostgreSQL database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
try:
    # declare a new PostgreSQL connection object
    conn = connect(
        dbname = "python_data",
        user = "postgres",
        host = "192.168.100.123",
        password = "1234",
        # attempt to connect for 3 seconds then raise exception
        connect_timeout = 3
    )

    cur = conn.cursor()
    print ("\ncreated cursor object:", cur)

except (Exception, Error) as err:
    print ("\npsycopg2 connect error:", err)
    conn = None
    cur = None

Use psycopg2 to insert JSON data

Once you’ve created a connection you can use the psycopg2 library’s execute() method to insert the JSON data by executing the string and commiting the transaction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# only attempt to execute SQL if cursor is valid
if cur != None:

    try:
        cur.execute( sql_string )
        conn.commit()

        print ('\nfinished INSERT INTO execution')

    except (Exception, Error) as error:
        print("\nexecute_sql() error:", error)
        conn.rollback()

    # close the cursor and connection
    cur.close()
    conn.close()

Run the Python script from command line

When the script is complete you can then save the code and run it by passing the script’s name to the python3 command in a terminal window:

1
python3 insert_json.py

Screenshot of an insert JSON data into PostgreSQL using Python example

If you want to remove all the inserted records from the Postgres table you can execute a TRUNCATE SQL statement in psql, followed by the table name, and all of the records will be deleted.

Conclusion

You should now have some Python-generated SQL for your Postgres table! Depending on how your source JSON data is constructed, the final SQL string returned by the script should look something like this:

1
2
3
INSERT INTO json_data (id, str_col, int_col, bool_col, float_col)
VALUES ('1', 'string data', 1435, TRUE, 25.11621924571661),
('2', 'more strings', 3774, FALSE, 30.696120169189474);

If you’d like to save the script’s output to a file just append the following Python code to the end of your script:

1
2
with open('insert-json.sql', 'w') as output_file:
    output_file.write( sql_string )

NOTE: The 'w' file mode will simply truncate and re-write data to the .sql file every time you run the script, but you can also use the 'a' mode to append data to the end of the SQL file instead.

Just the Code

Here’s the complete code you can use to load JSON from a file and convert it into a Postgres-compatible SQL string for the purpose of inserting records into a table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#!/usr/bin/python3
# -*- coding: utf-8 -*-

# import the psycopg2 database adapter for PostgreSQL
from psycopg2 import connect, Error

# import Python's built-in JSON library
import json

# import the JSON library from psycopg2.extras
from psycopg2.extras import Json

# import psycopg2's 'json' using an alias
from psycopg2.extras import json as psycop_json

# import Python's 'sys' library
import sys

# accept command line arguments for the Postgres table name
if len(sys.argv) > 1:
    table_name = '_'.join(sys.argv[1:])
else:
    # ..otherwise revert to a default table name
    table_name = "json_data"

print ("\ntable name for JSON data:", table_name)

# use Python's open() function to load the JSON data
with open('postgres-records.json') as json_data:

    # use load() rather than loads() for JSON files
    record_list = json.load(json_data)

print ("\nrecords:", record_list)
print ("\nJSON records object type:", type(record_list)) # should return "<class 'list'>"

# concatenate an SQL string
sql_string = 'INSERT INTO {} '.format( table_name )

# if record list then get column names from first key
if type(record_list) == list:
    first_record = record_list[0]

    columns = list(first_record.keys())
    print ("\ncolumn names:", columns)

# if just one dict obj or nested JSON dict
else:
    print ("Needs to be an array of JSON objects")
    sys.exit()

# enclose the column names within parenthesis
sql_string += "(" + ', '.join(columns) + ")\nVALUES "

# enumerate over the record
for i, record_dict in enumerate(record_list):

    # iterate over the values of each record dict object
    values = []
    for col_names, val in record_dict.items():

        # Postgres strings must be enclosed with single quotes
        if type(val) == str:
            # escape apostrophies with two single quotations
            val = val.replace("'", "''")
            val = "'" + val + "'"

        values += [ str(val) ]

    # join the list of values and enclose record in parenthesis
    sql_string += "(" + ', '.join(values) + "),\n"

# remove the last comma and end statement with a semicolon
sql_string = sql_string[:-2] + ";"

print ("\nSQL string:")
print (sql_string)

Using psycopg2’s Json library

Here’s another example that uses the Json library (from psycopg2) to convert string data into an SQL-compliant string, and it loads the data from a string within the script instead of opening a .json file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#!/usr/bin/python3
# -*- coding: utf-8 -*-

# import Python's JSON lib
import json

# import the new JSON method from psycopg2
from psycopg2.extras import Json

# use JSON loads to create a list of records
record_list = json.loads('''[
{
    "id": "cbf071a789d04f25b28b95976611c876",
    "str_col": "data's positronic brain data's positronic brain",
    "int_col": 1435,
    "bool_col": true,
    "float_col": 25.11621924571661
},
{
    "id": "9e5a9557bbac4c94bdb7cb8ef845cf29",
    "str_col": "Postgres' psycopg2 adapter examples examples",
    "int_col": 3774,
    "bool_col": false,
    "float_col": 30.696120169189474
},
{
    "id": "ef3051e93c034d16b42d2915d13e4a7a",
    "str_col": "type orkb's tutorials orkb bar ObjectRocket's tutorials",
    "int_col": 1586,
    "bool_col": false,
    "float_col": 4.443295005013988
}
]'''
)


# create a nested list of the records' values
values = [list(x.values()) for x in record_list]

# get the column names
columns = [list(x.keys()) for x in record_list][0]

# value string for the SQL string
values_str = ""

# enumerate over the records' values
for i, record in enumerate(values):

    # declare empty list for values
    val_list = []
   
    # append each value to a new list of values
    for v, val in enumerate(record):
        if type(val) == str:
            val = str(Json(val)).replace('"', '')
        val_list += [ str(val) ]

    # put parenthesis around each record string
    values_str += "(" + ', '.join( val_list ) + "),\n"

# remove the last comma and end SQL with a semicolon
values_str = values_str[:-2] + ";"

# concatenate the SQL string
table_name = "json_data"
sql_string = "INSERT INTO %s (%s)\nVALUES %s" % (
    table_name,
    ', '.join(columns),
    values_str
)

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.