How to use PostgreSQL CUME_DIST Function
Introduction
In this article we will learn about the PostgreSQL CUME_DIST Function and show some concrete code examples on its usage.
Prerequisites
- Ensure that PostgreSQL server is properly installed, configured and running on the background.
For Linux and Windows systems you can download PostgreSQL here
- To start PostgreSQL server use a LINUX machine use the following command.
| 1 | sudo service postgresql start | 
- To verify if the service is running use the following command.
| 1 | service postgresql status | 
- The result should look something like the following:
| 1 2 3 4 5 6 7 8 9 | ● postgresql.service - PostgreSQL RDBMS Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor prese Active: active (exited) since Thu 2019-08-01 14:51:20 PST; 36min ago Process: 1230 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 1230 (code=exited, status=0/SUCCESS) Aug 01 14:51:20 user-UX330UAK systemd[1]: Starting PostgreSQL RDBMS... Aug 01 14:51:20 user-UX330UAK systemd[1]: Started PostgreSQL RDBMS. lines 1-8/8 (END) | 
- To start, stop and restart PostgreSQL server in a Windows machine do the following instruction.
- Open Control Panel
- Open Administrative Tools
- Open Services
- Find the PostgreSQL Server service
- Start, Stop or Restart the service

What is CUME_DIST function
The CUME_DIST() perform returns the additive distribution of a value among a group of values. We can also say that, it returns the relative position of a value in a very set of values.
The syntax of the CUME_DIST() is as follows.
| 1 2 3 4 | CUME_DIST() OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... ) | 
- The PARTITION BYclause will divide the rows into multiple partitions to to which the function is defined.
- The ORDER BYclause will be sorting the rows in each of the partition to which theCUME_DIST()function is applied.
NOTE: The function will return the similar cumulative distribution values for the similar tie values.
CUME_DIST function examples
Now that we have an overview understanding about the CUME_DIST function, let’s try to use it with some basic examples.
But before we proceed with the application of the function we need to create a table.
| 1 2 3 4 5 6 | CREATE TABLE gamers_stats( name VARCHAR(100) NOT NULL, YEAR SMALLINT NOT NULL CHECK (YEAR > 0), points DECIMAL(10,2) CHECK (points >= 0), PRIMARY KEY (name,YEAR) ); | 
After creating the above gamers_stats table, we will now insert some rows.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | INSERT INTO gamers_stats(name, YEAR, points) VALUES ('Luke SkyWalker',2018,180000), ('Mike SkyScraper',2018,170000), ('Jack Sparrow',2018,140000), ('Jackie Chan',2018,300000), ('Jet li',2018,300000), ('Luke SkyWalker',2019,160000), ('Mike SkyScraper',2019,140000), ('Jack Sparrow',2019,150000), ('Jackie Chan',2019,28000), ('Jet Li',2019,250000); | 
PostgreSQL CUME_DIST partition example
Let’s create a query using the CUME_DIST() function to calculate the points percentile for each gamers in 2018 and 2019.
| 1 2 3 4 5 6 7 8 9 10 | SELECT name, YEAR, points, CUME_DIST() OVER ( PARTITION BY YEAR ORDER BY points ) FROM gamers_stats; | 
The result should look something like this:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 |       name       | YEAR |  points   | cume_dist -----------------+------+-----------+----------- Jack Sparrow | 2018 | 140000.00 | 0.2 Mike SkyScraper | 2018 | 170000.00 | 0.4 Luke SkyWalker | 2018 | 180000.00 | 0.6 Jackie Chan | 2018 | 300000.00 | 1 Jet li | 2018 | 300000.00 | 1 Jackie Chan | 2019 | 28000.00 | 0.2 Mike SkyScraper | 2019 | 140000.00 | 0.4 Jack Sparrow | 2019 | 150000.00 | 0.6 Luke SkyWalker | 2019 | 160000.00 | 0.8 Jet Li | 2019 | 250000.00 | 1 (10 ROWS) | 
Let’s discuss what happened in the above query.
- First we used the - PARTITION BYclause to be able to divided the rows into two partitions years 2018 and 2019.
- Then we use - ORDER BYclause to sort the points of every gamers in each partition from high to low to where we intended the- CUME_DIST()function to be applied.
PostgreSQL CUME_DIST over a result set example
The following example returns the gamer points percentage for each points of the gamer(s) in 2018:
| 1 2 3 4 5 6 7 8 9 10 11 | SELECT name, YEAR, points, CUME_DIST() OVER ( ORDER BY points ) FROM gamers_stats WHERE YEAR = 2018; | 
The result should look something like the following.
| 1 2 3 4 5 6 7 8 |       name       | YEAR |  points   | cume_dist -----------------+------+-----------+----------- Jack Sparrow | 2018 | 140000.00 | 0.2 Mike SkyScraper | 2018 | 170000.00 | 0.4 Luke SkyWalker | 2018 | 180000.00 | 0.6 Jackie Chan | 2018 | 300000.00 | 1 Jet li | 2018 | 300000.00 | 1 (5 ROWS) | 
The above result shows that 60% of gamers have points less than or equal to 180K in 2018.
Conclusion
In this article we talked about what the PostgreSQL CUME_DIST does and how to use it with a simple code example. If you’re still having trouble with CUME_DIST or PostgreSQL in general, please don’t hesitate to contact us at Object Rocket because we take pride in solving database problems.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started 
						 
							 
	
	
