How to use PostgreSQL CUME_DIST Function
Introduction
In this article we will learn about the PostgreSQL CUME_DIST Function and show some concrete code examples on its usage.
Prerequisites
- Ensure that PostgreSQL server is properly installed, configured and running on the background.
For Linux and Windows systems you can download PostgreSQL here
- To start PostgreSQL server use a LINUX machine use the following command.
1 | sudo service postgresql start |
- To verify if the service is running use the following command.
1 | service postgresql status |
- The result should look something like the following:
1 2 3 4 5 6 7 8 9 | ● postgresql.service - PostgreSQL RDBMS Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor prese Active: active (exited) since Thu 2019-08-01 14:51:20 PST; 36min ago Process: 1230 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 1230 (code=exited, status=0/SUCCESS) Aug 01 14:51:20 user-UX330UAK systemd[1]: Starting PostgreSQL RDBMS... Aug 01 14:51:20 user-UX330UAK systemd[1]: Started PostgreSQL RDBMS. lines 1-8/8 (END) |
- To start, stop and restart PostgreSQL server in a Windows machine do the following instruction.
- Open Control Panel
- Open Administrative Tools
- Open Services
- Find the PostgreSQL Server service
- Start, Stop or Restart the service
What is CUME_DIST function
The CUME_DIST()
perform returns the additive distribution of a value among a group of values. We can also say that, it returns the relative position of a value in a very set of values.
The syntax of the CUME_DIST()
is as follows.
1 2 3 4 | CUME_DIST() OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... ) |
- The
PARTITION BY
clause will divide the rows into multiple partitions to to which the function is defined. - The
ORDER BY
clause will be sorting the rows in each of the partition to which theCUME_DIST()
function is applied.
NOTE: The function will return the similar cumulative distribution values for the similar tie values.
CUME_DIST function examples
Now that we have an overview understanding about the CUME_DIST
function, let’s try to use it with some basic examples.
But before we proceed with the application of the function we need to create a table.
1 2 3 4 5 6 | CREATE TABLE gamers_stats( name VARCHAR(100) NOT NULL, YEAR SMALLINT NOT NULL CHECK (YEAR > 0), points DECIMAL(10,2) CHECK (points >= 0), PRIMARY KEY (name,YEAR) ); |
After creating the above gamers_stats
table, we will now insert some rows.
1 2 3 4 5 6 7 8 9 10 11 12 13 | INSERT INTO gamers_stats(name, YEAR, points) VALUES ('Luke SkyWalker',2018,180000), ('Mike SkyScraper',2018,170000), ('Jack Sparrow',2018,140000), ('Jackie Chan',2018,300000), ('Jet li',2018,300000), ('Luke SkyWalker',2019,160000), ('Mike SkyScraper',2019,140000), ('Jack Sparrow',2019,150000), ('Jackie Chan',2019,28000), ('Jet Li',2019,250000); |
PostgreSQL CUME_DIST partition example
Let’s create a query using the CUME_DIST() function to calculate the points percentile for each gamers in 2018 and 2019.
1 2 3 4 5 6 7 8 9 10 | SELECT name, YEAR, points, CUME_DIST() OVER ( PARTITION BY YEAR ORDER BY points ) FROM gamers_stats; |
The result should look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | name | YEAR | points | cume_dist -----------------+------+-----------+----------- Jack Sparrow | 2018 | 140000.00 | 0.2 Mike SkyScraper | 2018 | 170000.00 | 0.4 Luke SkyWalker | 2018 | 180000.00 | 0.6 Jackie Chan | 2018 | 300000.00 | 1 Jet li | 2018 | 300000.00 | 1 Jackie Chan | 2019 | 28000.00 | 0.2 Mike SkyScraper | 2019 | 140000.00 | 0.4 Jack Sparrow | 2019 | 150000.00 | 0.6 Luke SkyWalker | 2019 | 160000.00 | 0.8 Jet Li | 2019 | 250000.00 | 1 (10 ROWS) |
Let’s discuss what happened in the above query.
First we used the
PARTITION BY
clause to be able to divided the rows into two partitions years 2018 and 2019.Then we use
ORDER BY
clause to sort the points of every gamers in each partition from high to low to where we intended theCUME_DIST()
function to be applied.
PostgreSQL CUME_DIST over a result set example
The following example returns the gamer points percentage for each points of the gamer(s) in 2018:
1 2 3 4 5 6 7 8 9 10 11 | SELECT name, YEAR, points, CUME_DIST() OVER ( ORDER BY points ) FROM gamers_stats WHERE YEAR = 2018; |
The result should look something like the following.
1 2 3 4 5 6 7 8 | name | YEAR | points | cume_dist -----------------+------+-----------+----------- Jack Sparrow | 2018 | 140000.00 | 0.2 Mike SkyScraper | 2018 | 170000.00 | 0.4 Luke SkyWalker | 2018 | 180000.00 | 0.6 Jackie Chan | 2018 | 300000.00 | 1 Jet li | 2018 | 300000.00 | 1 (5 ROWS) |
The above result shows that 60% of gamers have points less than or equal to 180K in 2018.
Conclusion
In this article we talked about what the PostgreSQL CUME_DIST does and how to use it with a simple code example. If you’re still having trouble with CUME_DIST or PostgreSQL in general, please don’t hesitate to contact us at Object Rocket because we take pride in solving database problems.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started