Use of Random Function in Postgres SQL
In this article, we’ll learn to use the Random function in our Postgres SQL queries. We’ll start simple and then build a mini-project in order to better understand the scope of this function. Here we will discuss:
- What? From a high level, what does the RANDOM function do?
- Why? What are a couple situations where we would use the Random() function?
- How? How do we use this function in our PostgreSQL database?
- More We’ll also learn about the Floor, Generate_Series, and Count functions.
- Knowledge of some basic SQL (PL/SQL) statements, like
What is the RANDOM() function and how does it work?
RANDOM() function returns the a random number between 0 (including) and 1 (not including).
This will return numbers like 0.02355213, 0.33824445, 0.90257826, etc. All less than 1.
Why and when would we use the RANDOM() function?
Any time you need one or more numbers that are random, as in not easily predictable, like flipping a coin or rolling the dice in a game. You may want to choose random rows from a table. Or you may actually want to emulate rolling 20-sided dice. Or maybe you want to fill a table with random numbers. We’ll use that last idea as your mini project.
But first, let’s back up and make sure you fully understand how the RANDOM() function works by itself.
-- to get a random between (and including) 1 and 10:
-- to get a random integer between (and including) i_smallest and i_largest:
Quick note in case the FLOOR() function is new for you: It’s like the INT() function in many languages. It basically turns a number into an integer. Another way to put it: It chops off the decimal point.
Now let’s begin building our mini-project
Set a goal
Our goal is to fill a table with random numbers. We’ll begin by creating a practice table called “test_random”:
id serial NOT NULL
, i_number int4 NULL DEFAULT 0
, i_random_number int8 NULL DEFAULT 0
, i_times_picked int8 NOT NULL DEFAULT 0
, d_time TIME NULL DEFAULT now()
CREATE UNIQUE INDEX test_random_id_idx ON public.test_random USING btree (id);
The code above should give you a table like this:
You may be studying the table we just created and wondering why certain columns? Let’s see:
id: This is just good database design practice, to create a table with an indexed id field. Quite often you will need a field like this with a unique index, especially in relational databases where you (hopefully) tend to normalize. Also, we set ours to auto-increment, which is another common practice we recommend.
i_number: This field is purely so we can more easily follow what is going on when we insert and update data in our testrandom table. In many ways, this column is redundant, but – depending on what you do with this table _after this lesson – it could be quite useful. So we chose to include it.
i_random_number: This is the bread & butter of our project. This is the column where we store the random numbers we create using the Postgres RANDOM() function.
i_times_picked: This is the field we will use to store our calculation later when we UPDATE the table with a count (using the COUNT() function) of how many times any given random number popped up. We’ll use that in order to double-check just how random our numbers are.
d_time: This is another field we put in for your future use of this table, in case you decide to do testing to find out speeds of various techniques of using the RANDOM function in your SQL queries. Here, we are working with small datasets, but you could choose to change the _generateseries() function we use later to generate hundreds of thousands of rows instead of the small number we do here in a minute. And if you do that, you may wish to compare how much time it takes to run.
Next step. Fill our table with data
i AS i_number
generate_Series(1,10) AS i;
Analysis of the SQL above:
INSERT INTO: This is where we tell PostgreSQL’s SQL engine we are going to insert data into two columns (fields) and we are explictly naming those fields; i_number and i_random_number.
SELECT: Generating the data to go in the two columns we just mentioned. We’re getting “i” later in our FROM section. The next line,
FLOOR(RANDOM()*100)is what gives us a random integar between 0 and 100.
FROM: Uses the
generate_Series(i_start, i_end)function to generate 10 numbers in sequence, like so: 1, 2, 3, 4, 5, 6, 7 8, 9, 10. This will also cause our FROM to generate 10 rows, new rows because of our INSERT statement at the top.
Now, after executing the SQL above, your
test_random table should have 10 rows and look just like this, except with different numbers in the “i_random_number” column:
OK. Yay, the numbers in our
i_random_number column look to be random!
Now, we need to keep in mind we have generated only 10 numbers and really, that’s not enough to know anything about how random our distribution is. Also, we don’t have any duplicates, so no need to utilize our
i_times_picked column yet. So let’s generate more rows using the same query as before, except changing the line in our SQL that says
generate_Series(1,10) AS i; to use “20” instead of “10”. This will add 20 more rows to our 10-row dataset.
i AS i_number
generate_Series(1,20) AS i;
Now our dataset has 30 records with random numbers. Take a look at your dataset. Are there any duplicate numbers in the
i_random_number column (field)? If not, run that INSERT query again. Keep running it until your resulting data set has at least one duplicate in the
i_random_number field. Actually, while you are at it, you may as well set your “generate_series” function to create a few thousand rows. The more you create, the more likely we’ll be to find any patterns that may exist once we execute the UPDATE SQL we’re going to run below.
Optional step: Modify our table
OK, so now, to (a) appease our curiosity about just how random this RANDOM() function is, and (b) bone up on our advanced SQL writing, we’ll figure out how often any given random number was generated, so we can look for any patterns that might exist. In this case, since we want true-as-possible randomness, patterns bad. There is an important reason we are going in this direction and you will see if you stick with us to the end of this tutorial.
Execute the following:
test_random AS tbl_outer
i_times_picked = tbl_inner.i_count_for_this_rnd_num
, COUNT(i_random_number) AS i_count_for_this_rnd_num
) AS tbl_inner
WHERE tbl_outer.i_random_number = tbl_inner.i_random_number
Analysis of the SQL above:
UPDATE – Here we are establishing which table will be modified, as well as giving it an alias of “tbl_outer” so that it can be linked later to the dataset we are generating on the fly that we’ll alias as “tbl_inner”. These aliases can be any name you pick. We counsel you to pick meaningful names that lend themselves to easy readability.
SET – Fill the “i_times_picked” column with “i_count_for_this_rnd_num” generated in the SELECT statement below, aliased as tbl_inner.
SELECT – This is the beginning of the SQL that generates our aggregate group and count.
COUNT – The “meat” of the entire query.
COUNT(i_random_number) AS i_count_for_this_rnd_num looks at the grouping of random numbers, counts all duplicates, and places that count into a variable aliased (named) as i_count_for_this_rnd_num, which is used above (later, though, in the execute process, ha) in our UPDATE … SET statement.
FROM – This establishes the source of the data of our inner “table”.
GROUP BY – Group on the i_random_number field.
AS – Give our “virtual” inner “table” the alias “tbl_inner” so that data in this dataset can be referred to from other parts of the query.
WHERE – This is the glue that binds our two virtual tables together.
After the update (see data below), notice our
i_times_picked column now has data? Unfortunately, there are no repetitions in the i_random_number field. Or… actually, that is fortunate. It means – so far – we have randomness. BUT, as we talked about earlier, that is a small sample; too small to be really meaningful if we are looking for patterns.
So after we add 20 more rows, we finally get a couple repeated random numbers. See the “2” in a couple spots in our
Just how random is the RANDOM function?
OK. So. It worked. AND… even though it appears as if we are getting a random distribution, I can tell you from experience that it’s not as random as it could be.
Rather than bore you with information that you may not care to know, I’ll recommend you search for Postgres’ SETSEED function. Happy hunting!
In this tutorial, we learned when and how to use PostgreSQL’s
RANDOM() function to generate random-ish numbers with SQL statements. We also worked with other functions, including
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started