How to Use PostgreSQL LAG Function
Introduction
In this article we will learn how to use PostgreSQL LAG function. Please review the following prerequisites or if you already have PostgreSQL up and running you can jump straight into the article.
Prerequisites
- Ensure that PostgreSQL server is properly installed, configured and running on the background.
For Linux and Windows systems you can download PostgreSQL here
- To start PostgreSQL server use a LINUX machine use the following command.
1 | sudo service postgresql start |
- To verify if the service is running use the following command.
1 | service postgresql status |
- The result should look something like the following:
1 2 3 4 5 6 7 8 9 | ● postgresql.service - PostgreSQL RDBMS Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor prese Active: active (exited) since Thu 2019-08-01 14:51:20 PST; 36min ago Process: 1230 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 1230 (code=exited, status=0/SUCCESS) Aug 01 14:51:20 user-UX330UAK systemd[1]: Starting PostgreSQL RDBMS... Aug 01 14:51:20 user-UX330UAK systemd[1]: Started PostgreSQL RDBMS. lines 1-8/8 (END) |
- To start, stop and restart PostgreSQL server in a Windows machine do the following instruction.
- Open Control Panel
- Open Administrative Tools
- Open Services
- Find the PostgreSQL Server service
- Start, Stop or Restart the service
What is PostgreSQL LAG Function
LAG()
is a PostgreSQL function allows access to a row that comes before the present row at a defined physical offset. This means that from the current row the LAG()
function is able to access data of the previous row, or the row before the previous row, and so on.
The LAG()
function can be beneficial for comparing data from the current and previews table rows.
Below is the syntax for the LAG()
function.
1 2 3 4 5 | LAG(expression [,offset [,default_value]]) OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... ) |
Let’s discuss the above statement part by part.
The
expression
will be evaluated to the row that comes before this row at a fixed offset. It will be a column, expression, or subquery. A single value must be returned by the expression and cannot be a window function.The
offset
is a value that is composed of a number that specifies number of rows that comes before the present row from which to access the information. The offset may be an expression, subquery, or a column. The offset will be defaulted to one (1) should you not specify it.When the
offset
goes beyond the scope of the partition theLAG()
function will return the default_value. Should we take out default_value the function will be returningNULL
.Using
PARTITION BY
will divide the rows into partitions to where theLAG()
function will be applied. As a default, the entire result set will be treated as an individual partition when thePARTITION BY
clause was omitted.ORDER BY
simply specifies the order of the rows for every partition to where theLAG()
function is specified.
PostgreSQL LAG() function examples
Now that we have explain the syntax part by part we will now put that overview understanding to practice by taking some few examples.
Before we proceed with the example we will be creating a table to work with.
Let’s use the below query to create a table named team_stats
.
1 2 3 4 5 6 7 | CREATE TABLE team_stats( team_name VARCHAR(100) NOT NULL, team_id INT NOT NULL, year_game SMALLINT NOT NULL CHECK (year_game > 0), points DECIMAL(12,2) CHECK (points >= 0), PRIMARY KEY (team_name,year_game) ); |
After creating the above gamers_stats
table, we will now insert few rows in the table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | INSERT INTO team_stats(team_name, team_id, year_game, points) VALUES ('Luke SkyWalker',1,2018,180000), ('Mike SkyScraper',2,2018,170000), ('Jack Sparrow',3,2018,140000), ('Jackie Chan',4,2018,300000), ('Jet li',5,2018,300000), ('Luke SkyWalker',1,2019,160000), ('Mike SkyScraper',2,2019,140000), ('Jack Sparrow',3,2019,150000), ('Jackie Chan',4,2019,28000), ('Jet Li',5,2019,250000), ('Luke SkyWalker',1,2020,190000), ('Mike SkyScraper',2,2020,170000), ('Jack Sparrow',3,2020,180000), ('Jackie Chan',4,2020,290000), ('Jet Li',5,2020,270000); |
PostgreSQL Example How to Use LAG() function in Result Set
In this example we will be using the LAG() function to return the total points of the current year (2019) and the previous year (2018):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | WITH cte AS ( SELECT year_game, SUM(points) points FROM team_stats GROUP BY year_game ORDER BY year_game ) SELECT year_game, points, LAG(points,1) OVER ( ORDER BY year_game ) previous_year_points FROM cte; |
The result should resemble something like the following:
1 2 3 4 5 6 | year_game | points | previous_year_points -----------+------------+---------------------- 2018 | 1090000.00 | 2019 | 728000.00 | 1090000.00 2020 | 1100000.00 | 728000.00 (3 ROWS) |
In the above query, the CTE will return the total points summarized by year.
Then, the outer query Will be using the LAG()
function to return the total points of the previous year for each row. The first row has NULL in the previous_year_points
column because there is no previous year of the first row.
In this another example we are using three common table expressions to return the points difference or variance between the current and previous years:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | WITH cte AS ( SELECT year_game, SUM(points) points FROM team_stats GROUP BY year_game ORDER BY year_game ), cte2 AS ( SELECT year_game, points, LAG(points,1) OVER ( ORDER BY year_game ) previous_year_points FROM cte ) SELECT year_game, points, previous_year_points, (previous_year_points - points) variance FROM cte2; |
The result should resemble something like the following.
1 2 3 4 5 6 | year_game | points | previous_year_points | variance -----------+------------+----------------------+------------ 2018 | 1090000.00 | | 2019 | 728000.00 | 1090000.00 | 362000.00 2020 | 1100000.00 | 728000.00 | -372000.00 (3 ROWS) |
Example How to Use PostgreSQL LAG() function in Partitions
In this example we will be using the LAG()
function to compare the total points of the current year with the total points of the previous year of each team.
1 2 3 4 5 6 7 8 9 10 | SELECT year_game, points, team_id, LAG(points,1) OVER ( PARTITION BY team_id ORDER BY year_game ) previous_year_points FROM team_stats; |
The result should resemble something like the following.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | year_game | points | team_id | previous_year_points -----------+-----------+---------+---------------------- 2018 | 180000.00 | 1 | 2019 | 160000.00 | 1 | 180000.00 2020 | 190000.00 | 1 | 160000.00 2018 | 170000.00 | 2 | 2019 | 140000.00 | 2 | 170000.00 2020 | 170000.00 | 2 | 140000.00 2018 | 140000.00 | 3 | 2019 | 150000.00 | 3 | 140000.00 2020 | 180000.00 | 3 | 150000.00 2018 | 300000.00 | 4 | 2019 | 28000.00 | 4 | 300000.00 2020 | 290000.00 | 4 | 28000.00 2018 | 300000.00 | 5 | 2019 | 250000.00 | 5 | 300000.00 2020 | 270000.00 | 5 | 250000.00 (15 ROWS) |
- The
PARTITION BY
will distribute the rows into teams (or partitions) specified by column team_id. ORDER BY
will be sorting the rows in each team by years in ascending order.- Finally, the
LAG()
function will be applied to each partition to return the points of the previous year.
Conclusion
In this tutorial, you have learned how to use the PostgreSQL LAG()
function. We hope you’re able to apply what you’ve learned here to your issue or application.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started