Generate Postgres Test Data with Python (Part 2)
Introduction
In the first part of this article series we showed you how to setup a Python script that will create randomized Postgres data to use in conjunction with the INSERT INTO
SQL statement. Here is the link to Part 1.
The last part of this installment will show you how to format the JSON data created by the script so that you can easily insert some test data into a PostgreSQL table.
Creating Postgres data
In the last article we declared a Python function called create_postgres_json()
that randomly generates data and puts them into a list of dict
objects meant to represent a JSON array of potential Postgres data.
The following line of code is an example of how you can call the function and specify, with an integer argument, the total number of records you’d like to generate:
1 | json_records = create_postgres_json(3) |
NOTE: The above function call should return a Python list
object containing dict
“records” for Postgres. To get the complete code for that function you can visit the series’ first article, or scroll down to the end of this article.
Python dict to JSON
You can pass the list of dictionary objects (json_records
) to the json.dumps()
method to convert the Python object into a JSON string:
1 2 3 4 5 | # use the JSON library to convert JSON array into a Python string json_records_str = json.dumps(json_records, indent=4) print ("\nPostgres records JSON string:") print (json_records_str) |
You should be able to convert the string back to a JSON-compliant dictionary object using the following syntax:
1 2 | # convert the string back to a dict (JSON) object json_records = json.loads(json_records_str) |
If it raises an error then there’s a problem with the data, or you’re passing the wrong type of data to the method call. The json.loads()
method will only accept JSON-compatible data such as a dictionary or a list of dictionary objects.
Create a Postgres SQL string
The last step is to create a function that converts the data into a valid SQL string that is able to insert documents into a Postgres table.
Define another function that can accept a Python list of dictionary objects, and a string for the Postgres table name, as arguments:
1 2 3 4 5 6 7 8 9 | def create_insert_records( json_array, table_name ): # get the columns for the JSON records columns = json_array[0].keys() # SQL column names should be lowercase using underscores instead of spaces/hyphens columns = [str(col).lower().replace(" ", "_") for col in columns] columns = [str(col).lower().replace("-", "_") for col in columns] print ("\ncolumns:", columns) |
NOTE: Naming conventions for Postgres objects are lowercase strings that use underscores (_
) instead of spaces, dashes, or hyphens. The above code will ensure that your Postgres column names conform to these conventions.
Now declare a string for the Postgres INSERT INTO
SQL statement:
1 2 3 | # concatenate a string for the SQL 'INSERT INTO' statement sql_string = "INSERT INTO {}".format(table_name) sql_string = sql_string + " (" + ', '.join(columns) + ")\nVALUES " |
Then declare a list for the records and enumerate()
over the list of dictionary objects containing the randomized data:
1 2 3 4 5 6 7 8 9 10 | record_list = [] for i, record in enumerate( json_array ): keys = record.keys() values = record.values() # use map() to cast all items in record list as a string #record = list(map(str, values)) record = list(values) print (record) |
Fix the Postgres record data
Now you’ll need to iterate or enumerate over the values for each record and fix them so that Postgres won’t raise any syntax errors when you attempt to execute the SQL string:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # fix the values in the list if needed for i, val in enumerate(record): if type(val) == str: if "'" in val: # posix escape string syntax for single quotes record[i] = "E'" + record[i].replace("'", "''") + "'" # cast record as string and remove the list brackets [] record = str(record).replace("[", '') record = record.replace("]", '') # remove double quotes as well record = record.replace('"', '') |
NOTE: All strings in Postgres must be enclosed within single quotation marks, so the above code checks the JSON (dict
) object for any Python string types (str
), and then use replaces any single quotes within the string for an escape sequence that will allow for apostrophes in the data.
Append the Postgres records to the string
The last step is to append each record to the list and then iterate over the records one last time in order to append parenthesis around each record string before completing the INSERT INTO
statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # ..now append the records to the list record_list += [ record ] # enumerate() over the records and append to SQL string for i, record in enumerate(record_list): # use map() to cast all items in record list as a string #record = list(map(str, record)) # append the record list of string values to the SQL string sql_string = sql_string + "(" + record + "),\n" # replace the last comma with a semicolon sql_string = sql_string[:-2] + ";" return sql_string |
Create the ‘INSERT INTO’ SQL string
The next bit of code will dynamically pass the Postgres table name to the script, using a command line argument, while running the Python script:
1 2 3 4 5 6 | # allow the table name to be passed to the script if len(sys.argv) > 1: table_name = '_'.join(sys.argv[1:]) else: # otherwise use a default table name table_name = 'test_table' |
NOTE: The sys.argv
list must have a length that is greater than one because the first element in the list is just a string of the script’s name, so you’ll need to access all of the subsequent elements to get the actual system arguments passed by the user.
Call the create_insert_records()
function that we created earlier and make sure to pass the list of dict
objects to it, as well as the table name string:
1 2 3 4 5 | # call the function to create INSERT INTO SQL string sql_str = create_insert_records( json_records, table_name ) print ('\nsql_str:') print (sql_str) |
If you don’t want to use Python and psycopg2 to insert the records, then just save your code as-is, execute the script with python3
, and copy the SQL string output in your terminal or command prompt window so that you can paste the results into that particular database’s psql interface.
However, if you would like to have the psycopg2 adapter insert the records automatically, you’ll have to call the function and then pass the returned string to a psycopg2 cursor object.
The following code does this within a try-catch indentation block in case there are any raised exceptions or SQL syntax errors in the string:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # call the function to create INSERT INTO SQL string sql_str = create_insert_records( json_records, table_name ) if cur != None: try: sql_resp = cur.execute( sql_str ) conn.commit() print ('finished INSERT INTO execution') except (Exception, Error) as error: print("\nexecute_sql() error:", error) conn.rollback() # close the cursor and connection cur.close() conn.close() |
Execute the Python script
Make sure to save your script, and then use the python3
command (followed by your script’s name) to execute the Python code and it should generate and insert the randomized Postgres data. The following example will instruct Python to create data for a table named python_test
:
1 | python3 my_script.py python test |
You can also just run it with python3 my_script.py
and it will attempt to insert the data into the Postgres table name hard coded into the script.
Conclusion
If everything works as expected you should be able to access psql and have it return the inserted records using the SELECT * FROM python_test;
SQL statement (make sure to type q
to close the full screen viewer).
Since the function uses randomly generated UUIDs for the records’ IDs, you can run the script multiple times and it should just insert more records without having it return a duplicate key SQL error.
DISCLAIMER: This code should be used for development and experimental purposes only. Running automated scripts like this on production servers could potentially leave your data vulnerable to security risks and SQL injection attacks.
We’ve covered how to create randomized JSON data, and then format it to be an SQL-compliant INSERT INTO
string. You should now have some test data to play around with inside of your PostgreSQL table simply by running a Python script!
Just the Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | #!/usr/bin/python3 # -*- coding: utf-8 -*- # import libraries to generate data import random, uuid, time, json, sys from psycopg2 import sql, Error, connect # requires 'random' library def get_random_words(words_list, total=1): ran_words = [] # enumerate over specified number of words while len(ran_words) < total: ran_words += [words_list[random.randint(0, len(words_list)-1)]] # return a string by joining the list of words return ' '.join(ran_words) def create_postgres_json(size): # list to store JSON records records = [] # random words to inject into records words = ["ObjectRocket's tutorials", 'orkb', "orkb's tutorials", "Postgres' psycopg2 adapter", "data's positronic brain", 'type', 'examples', 'foo', 'bar'] # iterate over the number of records being created for rec_id in range(size): # create a new record dict new_record = {} # input a value for each table column new_record[ 'id' ] = uuid.uuid4().hex new_record[ 'str_col' ] = get_random_words( words, random.randint(1, len(words)) ) new_record[ 'int_col' ] = random.randint(1, 6000) new_record[ 'bool_col' ] = [True, False][random.randint(0, 1)] new_record[ 'float_col' ] = random.uniform(1.5, 99.9) # append the new record dict to the list records += [ new_record ] # return the list of JSON records return records def create_insert_records( json_array, table_name ): # get the columns for the JSON records columns = json_array[0].keys() # SQL column names should be lowercase using underscores instead of spaces/hyphens columns = [str(col).lower().replace(" ", "_") for col in columns] columns = [str(col).lower().replace("-", "_") for col in columns] print ("\ncolumns:", columns) # concatenate a string for the SQL 'INSERT INTO' statement sql_string = "INSERT INTO {}".format(table_name) sql_string = sql_string + " (" + ', '.join(columns) + ")\nVALUES " record_list = [] for i, record in enumerate( json_array ): keys = record.keys() values = record.values() # use map() to cast all items in record list as a string #record = list(map(str, values)) record = list(values) print (record) # fix the values in the list if needed for i, val in enumerate(record): if type(val) == str: if "'" in val: # posix escape string syntax for single quotes record[i] = "E'" + record[i].replace("'", "''") + "'" # cast record as string and remove the list brackets [] record = str(record).replace("[", '') record = record.replace("]", '') # remove double quotes as well record = record.replace('"', '') # ..now append the records to the list record_list += [ record ] # enumerate() over the records and append to SQL string for i, record in enumerate(record_list): # use map() to cast all items in record list as a string #record = list(map(str, record)) # append the record list of string values to the SQL string sql_string = sql_string + "(" + record + "),\n" # replace the last comma with a semicolon sql_string = sql_string[:-2] + ";" return sql_string print ("\n") # generate records for Postgres json_records = create_postgres_json(3) # use the JSON library to convert JSON array into a Python string json_records_str = json.dumps(json_records, indent=4) print ("\nPostgres records JSON string:") print (json_records_str) # convert the string back to a dict (JSON) object json_records = json.loads(json_records_str) # allow the table name to be passed to the script if len(sys.argv) > 1: table_name = '_'.join(sys.argv[1:]) else: # otherwise use a default table name table_name = 'test_table' # call the function to create INSERT INTO SQL string sql_str = create_insert_records( json_records, table_name ) print ('\nsql_str:') print (sql_str) # save the generated Postgres records in a JSON file with open('postgres-records.json', 'w') as output_file: output_file.write(json_records) try: # declare a new PostgreSQL connection object conn = connect( dbname = "python_data", user = "orkb", host = "192.168.100.40", password = "1234", # attempt to connect for 3 seconds then raise exception connect_timeout = 3 ) cur = conn.cursor() print ("\ncreated cursor object:", cur) except Error as err: print ("\npsycopg2 connect error:", err) conn = None cur = None # only attempt to execute SQL if cursor is valid if cur != None: try: sql_resp = cur.execute( sql_str ) conn.commit() print ('finished INSERT INTO execution') except (Exception, Error) as error: print("\nexecute_sql() error:", error) conn.rollback() # close the cursor and connection cur.close() conn.close() |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started