How to use PostgreSQL CUME_DIST Function

Introduction

In this article we will learn about the PostgreSQL CUME_DIST Function and show some concrete code examples on its usage.

Prerequisites

  • Ensure that PostgreSQL server is properly installed, configured and running on the background.

For Linux and Windows systems you can download PostgreSQL here

  • To start PostgreSQL server use a LINUX machine use the following command.
sudo service postgresql start
  • To verify if the service is running use the following command.
service postgresql status
  • The result should look something like the following:
● postgresql.service - PostgreSQL RDBMS
   Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor prese
   Active: active (exited) since Thu 2019-08-01 14:51:20 PST; 36min ago
  Process: 1230 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 1230 (code=exited, status=0/SUCCESS)

Aug 01 14:51:20 user-UX330UAK systemd[1]: Starting PostgreSQL RDBMS...
Aug 01 14:51:20 user-UX330UAK systemd[1]: Started PostgreSQL RDBMS.
lines 1-8/8 (END)
  • To start, stop and restart PostgreSQL server in a Windows machine do the following instruction.
  1. Open Control Panel
  2. Open Administrative Tools
  3. Open Services
  4. Find the PostgreSQL Server service
  5. Start, Stop or Restart the service

Restarting PostgreSQL server in a Windows Machine

What is CUME_DIST function

The CUME_DIST() perform returns the additive distribution of a value among a group of values. We can also say that, it returns the relative position of a value in a very set of values.

The syntax of the CUME_DIST() is as follows.

CUME_DIST() OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sort_expression [ASC | DESC], ...
)
  1. The PARTITION BY clause will divide the rows into multiple partitions to to which the function is defined.
  2. The ORDER BY clause will be sorting the rows in each of the partition to which the CUME_DIST() function is applied.

NOTE: The function will return the similar cumulative distribution values for the similar tie values.

CUME_DIST function examples

Now that we have an overview understanding about the CUME_DIST function, let’s try to use it with some basic examples.

But before we proceed with the application of the function we need to create a table.

CREATE TABLE gamers_stats(
    name VARCHAR(100) NOT NULL,
    YEAR SMALLINT NOT NULL CHECK (YEAR > 0),
    points DECIMAL(10,2) CHECK (points >= 0),
    PRIMARY KEY (name,YEAR)
);

After creating the above gamers_stats table, we will now insert some rows.

INSERT INTO
    gamers_stats(name, YEAR, points)
VALUES
    ('Luke SkyWalker',2018,180000),
    ('Mike SkyScraper',2018,170000),
    ('Jack Sparrow',2018,140000),
    ('Jackie Chan',2018,300000),
    ('Jet li',2018,300000),
    ('Luke SkyWalker',2019,160000),
    ('Mike SkyScraper',2019,140000),
    ('Jack Sparrow',2019,150000),
    ('Jackie Chan',2019,28000),
    ('Jet Li',2019,250000);

PostgreSQL CUME_DIST partition example

Let’s create a query using the CUME_DIST() function to calculate the points percentile for each gamers in 2018 and 2019.

SELECT
    name,
   YEAR,
   points,
    CUME_DIST() OVER (
      PARTITION BY YEAR
        ORDER BY points
    )
FROM
    gamers_stats;

The result should look something like this:

      name       | YEAR |  points   | cume_dist
-----------------+------+-----------+-----------
 Jack Sparrow    | 2018 | 140000.00 |       0.2
 Mike SkyScraper | 2018 | 170000.00 |       0.4
 Luke SkyWalker  | 2018 | 180000.00 |       0.6
 Jackie Chan     | 2018 | 300000.00 |         1
 Jet li          | 2018 | 300000.00 |         1
 Jackie Chan     | 2019 |  28000.00 |       0.2
 Mike SkyScraper | 2019 | 140000.00 |       0.4
 Jack Sparrow    | 2019 | 150000.00 |       0.6
 Luke SkyWalker  | 2019 | 160000.00 |       0.8
 Jet Li          | 2019 | 250000.00 |         1
(10 ROWS)

Let’s discuss what happened in the above query.

  1. First we used the PARTITION BY clause to be able to divided the rows into two partitions years 2018 and 2019.

  2. Then we use ORDER BY clause to sort the points of every gamers in each partition from high to low to where we intended the CUME_DIST() function to be applied.

PostgreSQL CUME_DIST over a result set example

The following example returns the gamer points percentage for each points of the gamer(s) in 2018:

SELECT
    name,
    YEAR,
    points,
    CUME_DIST() OVER (
        ORDER BY points
    )
FROM
    gamers_stats
WHERE
    YEAR = 2018;

The result should look something like the following.

      name       | YEAR |  points   | cume_dist
-----------------+------+-----------+-----------
 Jack Sparrow    | 2018 | 140000.00 |       0.2
 Mike SkyScraper | 2018 | 170000.00 |       0.4
 Luke SkyWalker  | 2018 | 180000.00 |       0.6
 Jackie Chan     | 2018 | 300000.00 |         1
 Jet li          | 2018 | 300000.00 |         1
(5 ROWS)

The above result shows that 60% of gamers have points less than or equal to 180K in 2018.

Conclusion

In this article we talked about what the PostgreSQL CUME_DIST does and how to use it with a simple code example. If you’re still having trouble with CUME_DIST or PostgreSQL in general, please don’t hesitate to contact us at Object Rocket because we take pride in solving database problems.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.