Generate Postgres Test Data with Python (Part 2)

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

In the first part of this article series we showed you how to setup a Python script that will create randomized Postgres data to use in conjunction with the INSERT INTO SQL statement. Here is the link to Part 1.

The last part of this installment will show you how to format the JSON data created by the script so that you can easily insert some test data into a PostgreSQL table.

Creating Postgres data

In the last article we declared a Python function called create_postgres_json() that randomly generates data and puts them into a list of dict objects meant to represent a JSON array of potential Postgres data.

The following line of code is an example of how you can call the function and specify, with an integer argument, the total number of records you’d like to generate:

1
json_records = create_postgres_json(3)

NOTE: The above function call should return a Python list object containing dict “records” for Postgres. To get the complete code for that function you can visit the series’ first article, or scroll down to the end of this article.

Python dict to JSON

You can pass the list of dictionary objects (json_records) to the json.dumps() method to convert the Python object into a JSON string:

1
2
3
4
5
# use the JSON library to convert JSON array into a Python string
json_records_str = json.dumps(json_records, indent=4)

print ("\nPostgres records JSON string:")
print (json_records_str)

Screenshot of a function to generate Postgres test data in Python

You should be able to convert the string back to a JSON-compliant dictionary object using the following syntax:

1
2
# convert the string back to a dict (JSON) object
json_records = json.loads(json_records_str)

If it raises an error then there’s a problem with the data, or you’re passing the wrong type of data to the method call. The json.loads() method will only accept JSON-compatible data such as a dictionary or a list of dictionary objects.

Create a Postgres SQL string

The last step is to create a function that converts the data into a valid SQL string that is able to insert documents into a Postgres table.

Define another function that can accept a Python list of dictionary objects, and a string for the Postgres table name, as arguments:

1
2
3
4
5
6
7
8
9
def create_insert_records( json_array, table_name ):

    # get the columns for the JSON records
    columns = json_array[0].keys()

    # SQL column names should be lowercase using underscores instead of spaces/hyphens
    columns = [str(col).lower().replace(" ", "_") for col in columns]
    columns = [str(col).lower().replace("-", "_") for col in columns]
    print ("\ncolumns:", columns)

NOTE: Naming conventions for Postgres objects are lowercase strings that use underscores (_) instead of spaces, dashes, or hyphens. The above code will ensure that your Postgres column names conform to these conventions.

Now declare a string for the Postgres INSERT INTO SQL statement:

1
2
3
    # concatenate a string for the SQL 'INSERT INTO' statement
    sql_string = "INSERT INTO {}".format(table_name)
    sql_string = sql_string + " (" + ', '.join(columns) + ")\nVALUES "

Then declare a list for the records and enumerate() over the list of dictionary objects containing the randomized data:

1
2
3
4
5
6
7
8
9
10
    record_list = []
    for i, record in enumerate( json_array ):

        keys = record.keys()
        values = record.values()

        # use map() to cast all items in record list as a string
        #record = list(map(str, values))
        record = list(values)
        print (record)

Fix the Postgres record data

Now you’ll need to iterate or enumerate over the values for each record and fix them so that Postgres won’t raise any syntax errors when you attempt to execute the SQL string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
        # fix the values in the list if needed
        for i, val in enumerate(record):

            if type(val) == str:
                if "'" in val:
                    # posix escape string syntax for single quotes
                    record[i] = "E'" + record[i].replace("'", "''") + "'"

        # cast record as string and remove the list brackets []
        record = str(record).replace("[", '')
        record = record.replace("]", '')

        # remove double quotes as well
        record = record.replace('"', '')

NOTE: All strings in Postgres must be enclosed within single quotation marks, so the above code checks the JSON (dict) object for any Python string types (str), and then use replaces any single quotes within the string for an escape sequence that will allow for apostrophes in the data.

Append the Postgres records to the string

The last step is to append each record to the list and then iterate over the records one last time in order to append parenthesis around each record string before completing the INSERT INTO statement:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
        # ..now append the records to the list
        record_list += [ record ]

    # enumerate() over the records and append to SQL string
    for i, record in enumerate(record_list):

        # use map() to cast all items in record list as a string
        #record = list(map(str, record))

        # append the record list of string values to the SQL string
        sql_string = sql_string + "(" + record + "),\n"

    # replace the last comma with a semicolon
    sql_string = sql_string[:-2] + ";"

    return sql_string

Create the ‘INSERT INTO’ SQL string

The next bit of code will dynamically pass the Postgres table name to the script, using a command line argument, while running the Python script:

1
2
3
4
5
6
# allow the table name to be passed to the script
if len(sys.argv) > 1:
    table_name = '_'.join(sys.argv[1:])
else:
    # otherwise use a default table name
    table_name = 'test_table'

NOTE: The sys.argv list must have a length that is greater than one because the first element in the list is just a string of the script’s name, so you’ll need to access all of the subsequent elements to get the actual system arguments passed by the user.

Call the create_insert_records() function that we created earlier and make sure to pass the list of dict objects to it, as well as the table name string:

1
2
3
4
5
# call the function to create INSERT INTO SQL string
sql_str = create_insert_records( json_records, table_name )

print ('\nsql_str:')
print (sql_str)

If you don’t want to use Python and psycopg2 to insert the records, then just save your code as-is, execute the script with python3, and copy the SQL string output in your terminal or command prompt window so that you can paste the results into that particular database’s psql interface.

Screenshot of records generate Postgres test data with Python pasted into psql

However, if you would like to have the psycopg2 adapter insert the records automatically, you’ll have to call the function and then pass the returned string to a psycopg2 cursor object.

The following code does this within a try-catch indentation block in case there are any raised exceptions or SQL syntax errors in the string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# call the function to create INSERT INTO SQL string
sql_str = create_insert_records( json_records, table_name )

if cur != None:

    try:
        sql_resp = cur.execute( sql_str )
        conn.commit()
        print ('finished INSERT INTO execution')

    except (Exception, Error) as error:
        print("\nexecute_sql() error:", error)
        conn.rollback()

    # close the cursor and connection
    cur.close()
    conn.close()

Execute the Python script

Make sure to save your script, and then use the python3 command (followed by your script’s name) to execute the Python code and it should generate and insert the randomized Postgres data. The following example will instruct Python to create data for a table named python_test:

1
python3 my_script.py python test

You can also just run it with python3 my_script.py and it will attempt to insert the data into the Postgres table name hard coded into the script.

Conclusion

If everything works as expected you should be able to access psql and have it return the inserted records using the SELECT * FROM python_test; SQL statement (make sure to type q to close the full screen viewer).

Since the function uses randomly generated UUIDs for the records’ IDs, you can run the script multiple times and it should just insert more records without having it return a duplicate key SQL error.

Screenshot of random Postgres records generated using Python

DISCLAIMER: This code should be used for development and experimental purposes only. Running automated scripts like this on production servers could potentially leave your data vulnerable to security risks and SQL injection attacks.

We’ve covered how to create randomized JSON data, and then format it to be an SQL-compliant INSERT INTO string. You should now have some test data to play around with inside of your PostgreSQL table simply by running a Python script!

Just the Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
#!/usr/bin/python3
# -*- coding: utf-8 -*-

# import libraries to generate data
import random, uuid, time, json, sys
from psycopg2 import sql, Error, connect

# requires 'random' library
def get_random_words(words_list, total=1):
    ran_words = []

    # enumerate over specified number of words
    while len(ran_words) < total:
        ran_words += [words_list[random.randint(0, len(words_list)-1)]]

    # return a string by joining the list of words
    return ' '.join(ran_words)

def create_postgres_json(size):

    # list to store JSON records
    records = []

    # random words to inject into records
    words = ["ObjectRocket's tutorials", 'orkb', "orkb's tutorials",
    "Postgres' psycopg2 adapter", "data's positronic brain", 'type', 'examples', 'foo', 'bar']

    # iterate over the number of records being created
    for rec_id in range(size):

        # create a new record dict
        new_record = {}

        # input a value for each table column
        new_record[ 'id' ] = uuid.uuid4().hex
        new_record[ 'str_col' ] = get_random_words( words, random.randint(1, len(words)) )
        new_record[ 'int_col' ] = random.randint(1, 6000)
        new_record[ 'bool_col' ] = [True, False][random.randint(0, 1)]
        new_record[ 'float_col' ] = random.uniform(1.5, 99.9)

        # append the new record dict to the list
        records += [ new_record ]

    # return the list of JSON records
    return records

def create_insert_records( json_array, table_name ):

    # get the columns for the JSON records
    columns = json_array[0].keys()

    # SQL column names should be lowercase using underscores instead of spaces/hyphens
    columns = [str(col).lower().replace(" ", "_") for col in columns]
    columns = [str(col).lower().replace("-", "_") for col in columns]
    print ("\ncolumns:", columns)

    # concatenate a string for the SQL 'INSERT INTO' statement
    sql_string = "INSERT INTO {}".format(table_name)
    sql_string = sql_string + " (" + ', '.join(columns) + ")\nVALUES "

    record_list = []
    for i, record in enumerate( json_array ):

        keys = record.keys()
        values = record.values()

        # use map() to cast all items in record list as a string
        #record = list(map(str, values))
        record = list(values)
        print (record)

        # fix the values in the list if needed
        for i, val in enumerate(record):

            if type(val) == str:
                if "'" in val:
                    # posix escape string syntax for single quotes
                    record[i] = "E'" + record[i].replace("'", "''") + "'"

        # cast record as string and remove the list brackets []
        record = str(record).replace("[", '')
        record = record.replace("]", '')

        # remove double quotes as well
        record = record.replace('"', '')

        # ..now append the records to the list
        record_list += [ record ]

    # enumerate() over the records and append to SQL string
    for i, record in enumerate(record_list):

        # use map() to cast all items in record list as a string
        #record = list(map(str, record))

        # append the record list of string values to the SQL string
        sql_string = sql_string + "(" + record + "),\n"

    # replace the last comma with a semicolon
    sql_string = sql_string[:-2] + ";"

    return sql_string

print ("\n")

# generate records for Postgres
json_records = create_postgres_json(3)

# use the JSON library to convert JSON array into a Python string
json_records_str = json.dumps(json_records, indent=4)

print ("\nPostgres records JSON string:")
print (json_records_str)

# convert the string back to a dict (JSON) object
json_records = json.loads(json_records_str)

# allow the table name to be passed to the script
if len(sys.argv) > 1:
    table_name = '_'.join(sys.argv[1:])
else:
    # otherwise use a default table name
    table_name = 'test_table'

# call the function to create INSERT INTO SQL string
sql_str = create_insert_records( json_records, table_name )

print ('\nsql_str:')
print (sql_str)


# save the generated Postgres records in a JSON file
with open('postgres-records.json', 'w') as output_file:
    output_file.write(json_records)

try:
    # declare a new PostgreSQL connection object
    conn = connect(
        dbname = "python_data",
        user = "orkb",
        host = "192.168.100.40",
        password = "1234",
        # attempt to connect for 3 seconds then raise exception
        connect_timeout = 3
    )

    cur = conn.cursor()
    print ("\ncreated cursor object:", cur)

except Error as err:
    print ("\npsycopg2 connect error:", err)
    conn = None
    cur = None

# only attempt to execute SQL if cursor is valid
if cur != None:

    try:
        sql_resp = cur.execute( sql_str )
        conn.commit()
        print ('finished INSERT INTO execution')

    except (Exception, Error) as error:
        print("\nexecute_sql() error:", error)
        conn.rollback()

    # close the cursor and connection
    cur.close()
    conn.close()

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.