How to Create PostgreSQL Test Data
Introduction
When you’re developing applications or trying to evaluate query performance, it can be helpful to have some test data on hand. However, creating a large dataset manually can be tedious and time-consuming. With PostgreSQL, it’s easy to create a test dataset containing randomly-generated values with just a couple of simple commands. In this article, we’ll show you how to generate PostgreSQL test data using the psql
command-line interface.
Prerequisites
Before going any further with this tutorial, you’ll need to make sure you have a PostgreSQL database cluster working on your machine. You can use the command systemctl status postgresql
on a Linux machine to check if it’s installed. It’s also important to have Docker installed on your machine and to have some basic knowledge of how to use a Docker container– we’ll be using a container to hold our PostgreSQL test data.
Postgres Docker container
A Docker container is used for the development and deployment of applications. With a container, applications can run faster and become portable and scalable.
To create and run a Docker container, you’ll need to use the commands shown below:
1 | docker run --name objectrocket -e POSTGRES_PASSWORD=1234 -p 5432:5432 -d postgres |
The Docker container will automatically pull the image for PostgreSQL, and it will be used as our container.
Let’s review some of the options available for a Docker command:
The
--name
option will provide the name of the container you will run and use.The
-e
option sets an environment variable within the Docker container; in our example, we set the PostgreSQL password.You can use the
-p
option to bind the5432:5432
port to the localhost.The
-d
option will be used to launch the container in the background along with pulling the PostgreSQL image.
To see a list of available running containers, use this command:
1 | docker ps |
The output will look something like this:
1 2 | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 16bff39ba7de postgres "docker-entrypoint.s…" 10 seconds ago Up 6 seconds 0.0.0.0:5432->5432/tcp objectrocket |
Now that we’ve reviewed some Docker fundamentals, let’s create a table in the psql
command-line console for PostgreSQL.
To access psql
on your machine, open your terminal and use the following command:
1 | sudo -u postgres psql -h localhost -p 5432 |
The table we’re going to create will hold our generated PostgreSQL test data.
Create a table in Postgres
Now we’ll use the postgres
database to create a table.
To create a table in PostgreSQL, we use the CREATE TABLE
command. The basic syntax is shown below:
1 | CREATE TABLE TABLE_NAME(COLUMN + datatype + constraints [optional]); |
Here’s the statement we’ll use to create the table for our test data:
1 2 3 4 5 6 7 | CREATE TABLE users( id BIGINT GENERATED ALWAYS AS IDENTITY, PRIMARY KEY(id), hash_firstname TEXT NOT NULL, hash_lastname TEXT NOT NULL, gender VARCHAR(6) NOT NULL CHECK (gender IN ('male', 'female')) ); |
PostgreSQL test data
Next, we’ll INSERT
values into the table using a random function to generate the data:
1 2 | INSERT INTO users(hash_firstname, hash_lastname, gender) SELECT md5(RANDOM()::TEXT), md5(RANDOM()::TEXT), CASE WHEN RANDOM() < 0.5 THEN 'male' ELSE 'female' END FROM generate.series(1, 10000); |
NOTE: This function uses md5
, which is an algorithm used for hash functions that produces a 128-bit hash value.
Let’s check out the results from the data that has been generated:
Using the SELECT
statement, we can count the total rows of data in the users
table:
1 2 3 4 5 | SELECT COUNT(*) FROM users; COUNT ------- 10000 (1 ROW) |
We can also count the number of rows where the value of gender
is equal to male
:
1 2 3 4 5 | SELECT COUNT(*) FROM users WHERE gender = 'male'; COUNT ------- 5048 (1 ROW) |
We can do the same for the female
data:
1 2 3 4 | COUNT ------- 4952 (1 ROW) |
Finally, let’s select and display all the data from the users
table. Since we’re dealing with a large amount of data, we’ll use the LIMIT
clause to return just a few rows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SELECT * FROM users LIMIT 15; id | hash_firstname | hash_lastname | gender ----+----------------------------------+----------------------------------+-------- 1 | 6f649335d8f4dc2510fa5a1f7eb49d2e | 3742f76184a441fecb00e7a9ed655cd4 | male 2 | 36fcde01f52ad3ed359c0fa89cf07664 | 24660b13820dedaae22ccb4d71422b99 | male 3 | f68a8fdbeee077491d0015fd37c78fcf | daa6f64b3eb60a9b6494c39a666c7c71 | male 4 | 166fbca935e7175880b8e34440487c3c | 682027f66518552ab04f1105e2a0eae7 | female 5 | 4972fcafc6ae949646762d52184f900d | 666becc1d8f50993d5d4dc435e8bc534 | male 6 | 86631b9a3cc4eae00619acc0ed2d781e | 094d7867fbe572c0cbed3b5996262fe6 | female 7 | c60c876550035c5f36045b765ad64ef6 | 48e731262757b8176ee2d593eedd94d3 | male 8 | daf7c30dba0679f0859e6358c9a2da53 | e44e73102cdc3059a3e6edc483faad2e | male 9 | 1b657d7e4c37c08b5c72572fdeaf0a9c | b5a48a1c44b9247cc3d9d4f14a4efdc4 | male 10 | 6e26230bb9df62a8e44797a0cc37ea05 | 942977ab1fd00a9ea665668d13cf9a7d | female 11 | 0533a975b5bfda8149fec58e213fd796 | 30ca1aa2cdbaa440897c4bd2e68ab5c3 | male 12 | f5d1e6d98831f50cc80f115c6baebf76 | 857404ddaf714e7284f7a441d91a225f | female 13 | f1233af1bd103ac3c23331681a5d425d | dcf43368115c0786aee36ffad469dae7 | female 14 | d82253175a06100a842f8db37bf7fea8 | a2e8f9d83fb88e3490b6d76d330e6b5b | male 15 | ddd01e9d271fa4a1e063bf55db0b537c | 511c5e95fe3ee8cfca797a30131305cd | male (15 ROWS) |
Conclusion
When you need some PostgreSQL test data but don’t want to waste time manually inserting records, it’s easy to create a dataset using randomly-generated values. In this article, we walked you through the process of creating a test dataset in PostgreSQL. With our examples to guide you, you’ll be able to generate your own dataset to use in testing and development.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started