Insert JSON Data into PostgreSQL using Python (Part 2)
Introduction
This is part two of an article series that demonstrates how to load JSON data from a file and convert it into Python objects that can be parsed and formatted into an SQL-compliant string that, when executed, will insert record data into a PostgreSQL table. In this article you’ll learn how to use Python to open a .json
file in order to read and parse its data in order to create the SQL string.
Prerequisites
You will need a stable version of Python 3 installed (the code in this article has not been tested on Python 2.7), and you’ll need to install the psycopg2
Python adapter for Postgres as well if you intend on executing the final SQL statement.
Execute the following bash command in a terminal or command prompt window to instruct PIP to install the adapter:
1 | pip3 install psycopg2 |
In the last article of this series we created some JSON data that the code can use to parse and format. Make sure you have some JSON objects in an array that can be used as Postgres table data before attempting to use the examples found in this article.
Use Python to parse JSON
In part one we created a Python script and imported the following libraries to help read, parse, and insert some JSON data into a Postgres table:
1 2 3 4 5 | # import Python's built-in JSON library import json, sys # import the psycopg2 database adapter for PostgreSQL from psycopg2 import connect, Error |
Use Python to read a JSON file
The following example code will open a .json
file and render the data as Python objects using the json.load()
method call with open()
:
1 2 3 4 5 | # use Python's open() function to load the JSON data with open('postgres-records.json') as json_data: # use load() rather than loads() for JSON files record_list = json.load(json_data) |
NOTE: To convert a JSON-compliant string into Python objects use the json.loads()
method instead. The “s” at the end of loads
stands for “string”.
Use the following code to evaluate the object returned by json.load()
to verify that it’s a list
and let’s grab the first potential Postgres “record”:
1 2 | if type(record_list) == list: first_record = record_list[0] |
Now let’s use the Python dictionary’s keys()
method to retrieve its “column” names in the form of a list:
1 2 3 | # get the column names from the first record columns = list(first_record.keys()) print ("\ncolumn names:", columns) |
You’ll need that column data later when concatenating the INSERT INTO
SQL string. Here’s another way you can get the column names using a one-liner:
1 | columns = [list(x.keys()) for x in record_list][0] |
Declare an SQL string for Postgres records
Let’s start by declaring the string for the SQL statement. The following code will declare the string and put the table name and joined column names into it:
1 2 3 | table_name = "json_data" sql_string = 'INSERT INTO {} '.format( table_name ) sql_string += "(" + ', '.join(columns) + ")\nVALUES " |
NOTE: Make sure that your Postgres table name doesn’t use any spaces or hyphens.
The above example code will insert a table name value into the string using Python’s format()
function, but be aware that this only works in Python 3.
Use Python to parse JSON
This next part is the “meat and potatoes” of the script because it’s the part of the code that will parse the Python dict
objects, containing your Postgres record data, and it will format and append them to the string in a manner that will keep it SQL-compliant.
Using Python to parse a JSON object
This first example enumerates over the list of dictionaries (Postgres records) and each of their respective values, and then casts each value as a SQL-compliant string before joining them together:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # enumerate over the record for i, record_dict in enumerate(record_list): # iterate over the values of each record dict object values = [] for col_names, val in record_dict.items(): # Postgres strings must be enclosed with single quotes if type(val) == str: # escape apostrophies with two single quotations val = val.replace("'", "''") val = "'" + val + "'" values += [ str(val) ] |
The above code evaluates each piece of record value and it will format it properly if it’s a string before appending it to a new “values” list.
The last step is to enclose each record’s values in parenthesis, after joining them together, and finally append a semicolon at the end of the SQL statement:
1 2 3 4 5 | # join the list of values and enclose record in parenthesis sql_string += "(" + ', '.join(values) + "),\n" # remove the last comma and end statement with a semicolon sql_string = sql_string[:-2] + ";" |
Another Python and JSON parser example
You can also just put all of the values into a nested Python list using this one-liner:
1 2 | # create a nested list of the records' values values = [list(x.values()) for x in record_list] |
After that you just declare an empty string for the record values and then enumerate over the record values while you pass strings to psycopg2’s Json
library so that you can escape single quotes or apostrophes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # value string for the SQL string values_str = "" # enumerate over the records' values for i, record in enumerate(values): # declare empty list for values val_list = [] # append each value to a new list of values for v, val in enumerate(record): if type(val) == str: val = str(Json(val)).replace('"', '') val_list += [ str(val) ] # put parenthesis around each record string values_str += "(" + ', '.join( val_list ) + "),\n" # remove the last comma and end SQL with a semicolon values_str = values_str[:-2] + ";" |
NOTE: Since version 2.5.4 of the psycopg2 library there’s now better JSON support with the implementation of its Json
adapter. In a terminal or command prompt tab you can use the pip3 list
command to have PIP return all of its installed Python packages so that you can view their respective version numbers.
The above code will append each joined value list to the values_str
object which you can combine with the column and table name to complete the SQL statement using the following string concatenation:
1 2 3 4 5 6 7 | # concatenate the SQL string table_name = "json_data" sql_string = "INSERT INTO %s (%s)\nVALUES %s" % ( table_name, ', '.join(columns), values_str ) |
The above code uses the %c
placeholder to insert values into the SQL string including the table name and column names.
Insert JSON data into PostgreSQL
If you just want to output the results to the terminal use Python’s print function to see how the string turned out. Put the following code in the end of your script to have Python output the results of the concatenated SQL string to a terminal window:
1 2 | print ("\nSQL statement:") print (sql_string) |
psycopg2 insert example
This next part of the Python code is optional since you may want to inspect or alter the SQL string output before having psycopg2 insert it.
Code for psycopg2 to connect to Postgres
If you want to execute the SQL string in the script instead you can append the following example code that uses the psycopg2 library to connect to a PostgreSQL database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | try: # declare a new PostgreSQL connection object conn = connect( dbname = "python_data", user = "postgres", host = "192.168.100.123", password = "1234", # attempt to connect for 3 seconds then raise exception connect_timeout = 3 ) cur = conn.cursor() print ("\ncreated cursor object:", cur) except (Exception, Error) as err: print ("\npsycopg2 connect error:", err) conn = None cur = None |
Use psycopg2 to insert JSON data
Once you’ve created a connection you can use the psycopg2 library’s execute()
method to insert the JSON data by executing the string and commiting the transaction:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # only attempt to execute SQL if cursor is valid if cur != None: try: cur.execute( sql_string ) conn.commit() print ('\nfinished INSERT INTO execution') except (Exception, Error) as error: print("\nexecute_sql() error:", error) conn.rollback() # close the cursor and connection cur.close() conn.close() |
Run the Python script from command line
When the script is complete you can then save the code and run it by passing the script’s name to the python3
command in a terminal window:
1 | python3 insert_json.py |
If you want to remove all the inserted records from the Postgres table you can execute a TRUNCATE
SQL statement in psql, followed by the table name, and all of the records will be deleted.
Conclusion
You should now have some Python-generated SQL for your Postgres table! Depending on how your source JSON data is constructed, the final SQL string returned by the script should look something like this:
1 2 3 | INSERT INTO json_data (id, str_col, int_col, bool_col, float_col) VALUES ('1', 'string data', 1435, TRUE, 25.11621924571661), ('2', 'more strings', 3774, FALSE, 30.696120169189474); |
If you’d like to save the script’s output to a file just append the following Python code to the end of your script:
1 2 | with open('insert-json.sql', 'w') as output_file: output_file.write( sql_string ) |
NOTE: The 'w'
file mode will simply truncate and re-write data to the .sql
file every time you run the script, but you can also use the 'a'
mode to append data to the end of the SQL file instead.
Just the Code
Here’s the complete code you can use to load JSON from a file and convert it into a Postgres-compatible SQL string for the purpose of inserting records into a table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #!/usr/bin/python3 # -*- coding: utf-8 -*- # import the psycopg2 database adapter for PostgreSQL from psycopg2 import connect, Error # import Python's built-in JSON library import json # import the JSON library from psycopg2.extras from psycopg2.extras import Json # import psycopg2's 'json' using an alias from psycopg2.extras import json as psycop_json # import Python's 'sys' library import sys # accept command line arguments for the Postgres table name if len(sys.argv) > 1: table_name = '_'.join(sys.argv[1:]) else: # ..otherwise revert to a default table name table_name = "json_data" print ("\ntable name for JSON data:", table_name) # use Python's open() function to load the JSON data with open('postgres-records.json') as json_data: # use load() rather than loads() for JSON files record_list = json.load(json_data) print ("\nrecords:", record_list) print ("\nJSON records object type:", type(record_list)) # should return "<class 'list'>" # concatenate an SQL string sql_string = 'INSERT INTO {} '.format( table_name ) # if record list then get column names from first key if type(record_list) == list: first_record = record_list[0] columns = list(first_record.keys()) print ("\ncolumn names:", columns) # if just one dict obj or nested JSON dict else: print ("Needs to be an array of JSON objects") sys.exit() # enclose the column names within parenthesis sql_string += "(" + ', '.join(columns) + ")\nVALUES " # enumerate over the record for i, record_dict in enumerate(record_list): # iterate over the values of each record dict object values = [] for col_names, val in record_dict.items(): # Postgres strings must be enclosed with single quotes if type(val) == str: # escape apostrophies with two single quotations val = val.replace("'", "''") val = "'" + val + "'" values += [ str(val) ] # join the list of values and enclose record in parenthesis sql_string += "(" + ', '.join(values) + "),\n" # remove the last comma and end statement with a semicolon sql_string = sql_string[:-2] + ";" print ("\nSQL string:") print (sql_string) |
Using psycopg2’s Json library
Here’s another example that uses the Json
library (from psycopg2) to convert string data into an SQL-compliant string, and it loads the data from a string within the script instead of opening a .json
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #!/usr/bin/python3 # -*- coding: utf-8 -*- # import Python's JSON lib import json # import the new JSON method from psycopg2 from psycopg2.extras import Json # use JSON loads to create a list of records record_list = json.loads('''[ { "id": "cbf071a789d04f25b28b95976611c876", "str_col": "data's positronic brain data's positronic brain", "int_col": 1435, "bool_col": true, "float_col": 25.11621924571661 }, { "id": "9e5a9557bbac4c94bdb7cb8ef845cf29", "str_col": "Postgres' psycopg2 adapter examples examples", "int_col": 3774, "bool_col": false, "float_col": 30.696120169189474 }, { "id": "ef3051e93c034d16b42d2915d13e4a7a", "str_col": "type orkb's tutorials orkb bar ObjectRocket's tutorials", "int_col": 1586, "bool_col": false, "float_col": 4.443295005013988 } ]''') # create a nested list of the records' values values = [list(x.values()) for x in record_list] # get the column names columns = [list(x.keys()) for x in record_list][0] # value string for the SQL string values_str = "" # enumerate over the records' values for i, record in enumerate(values): # declare empty list for values val_list = [] # append each value to a new list of values for v, val in enumerate(record): if type(val) == str: val = str(Json(val)).replace('"', '') val_list += [ str(val) ] # put parenthesis around each record string values_str += "(" + ', '.join( val_list ) + "),\n" # remove the last comma and end SQL with a semicolon values_str = values_str[:-2] + ";" # concatenate the SQL string table_name = "json_data" sql_string = "INSERT INTO %s (%s)\nVALUES %s" % ( table_name, ', '.join(columns), values_str ) |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started