Postgres Aggregate Functions
Introduction
In this lesson we learn how to use the following Postgres aggregate functions: AVG function and SUM function, which are both aggregate functions in PostgreSQL. We also explore a statement required for filtering of aggregated data; “HAVING”.
Prerequisites
- An understanding of how SQL queries fit into the Postgres database system and the means to execute them.
- Understanding of common query types and clauses, including SELECT, FROM, and HAVING (which is like WHERE but for aggregate data, as you will learn). If you have no understanding of how “HAVING” works, you have come to the right place as we’ll be learning it here.
Essential to understanding and using aggregate functions in Postgres is learning to GROUP or aggregate table data, so we will begin by studying how the GROUP BY command works.
Postgres GROUP BY clause
The Postgres GROUP BY statement aggregates a set of rows so that we can use group-based (aggregate) functions like Avg, Count, Min, Max, and Sum. The GROUP BY statement is used with SELECT to group the records (rows) in a Postgres table or view that have a specific data “look”. The purpose can be to return values that apply to the group(s) and/or to remove dupes.
Note: GROUP BY is placed after the WHERE or HAVING statements and before ORDER BY.
Syntax of Postgres GROUP BY
1 2 3 4 5 6 7 8 9 10 11 | SELECT value_distinct , Your_Aggregate_Function([name_of_column]) FROM tbl_table_name_here_ WHERE OR HAVING [SOME condition(s) met] GROUP BY value_distinct; ORDER BY [name_column TO sort]; |
Analysis
In the above Group By syntax example, _Your_AggregateFunction() represents functions that act upon a set of rows, as mentioned above.
HAVING or WHERE?
HAVING and WHERE occupy similar but different roles. HAVING can be essential to filtering your aggregated records.
- We use HAVING to limit records in a group. This – instead of WHERE – is what you will typically use when using aggregate functions in your SQL.
- Use WHERE to limit records from any non-aggregated set. This occurs before any groupings are made.
We’ll learn in a bit more detail about HAVING later in this tutorial.
Why and when will we use the GROUP BY clause?
When we need to group data, we use GROUP BY to either remove redundancy or utilize an aggregate function. Below, we will show examples. A simple rule is to use WHERE before GROUP BY and HAVING after GROUP BY. This will work for most situations.
Postgres Group By example
We created and filled the PostgreSQL table as “tbl_functions”:
t_name_function | t_cat | t_operates_on | i_difficulty | i_uses |
---|---|---|---|---|
min | Aggregate | number | 4 | 8 |
max | Aggregate | number | 4 | 8 |
avg | Aggregate | number | 4 | 12 |
count | Aggregate | all | 4 | 14 |
sum | Aggregate | number | 4 | 20 |
char_length | Singular | string | 3 | 32 |
Analysis
- _t_namefunction is the name we’ve stored in Postgres.
- _tcat keeps track of whether the function in this row is for aggregate or singular use. If “Aggregate”, then we know this function uses GROUP BY.
- _t_operateson is whether this function’s parameter requires a string or number.
- _idifficulty is a measurement of the learning curve required to learn this function.
- _iuses is a percentage as integer of how often the function is used in the company’s code base.
Postgres Group By SQL
Now we will use GROUP BY on the “tbl_functions” Postgres table, first choosing to leave out use of functions.
1 2 3 4 5 6 | SELECT t_cat FROM tbl_functions GROUP BY t_cat; |
The Postgres query above returns:
t_cat |
---|
Singular |
Aggregate |
Why is that? With the SQL above, we told PostgreSQL:
- Look at the “t_cat” column only.
- Return only unique values from that column.
Let’s go through the same process with another column in our table:
1 2 3 4 5 6 | SELECT t_operates_on FROM tbl_functions GROUP BY t_operates_on; |
The above SQL returns:
t_operates_on |
---|
number |
all |
string |
Note that in the above returned data, the duplicates in the _t_operateson column were aggregated?
Let’s increase the difficulty by adding in an aggregate function, starting with AVG. We’ll build the Postgres SQL to use the GROUP BY clause along with the AVG aggregate function to see the average i_difficulty to learn and use any of the functions in our company applications.
First, though:
The Postgres AVG function
PostgreSQL’s AVG function returns the average of a group of numbers.
Postgres AVG function syntax
1 | i_average := AVG(group_of_values); |
Postgres AVG function example
Our initial dataset is a table called tbl_languages.
id_technology | t_name | i_dif |
---|---|---|
0 | Python | 8 |
1 | Java | 7 |
2 | C# | 9 |
3 | JavaScript | 8 |
The exercise is to find the average difficulty – see the “i_dif” column – of all four technology ages in the table you see above.
Note for the below sql example. This is where we use our knowledge of SQL GROUPing into the mix. You can think of it this way: If you want to average more than one number, you must look at the entire group of values.
1 2 3 4 5 6 | SELECT AVG(i_dif) FROM tbl_languages GROUP BY i_dif; |
How does the above work? We’ll go piece by piece:
- (1) SELECT: We are telling Postgres’ SQL engine that we want an average of all values (because no WHERE statement) in the _idif column.
- (2) FROM: Specifying _tbllanguages as our data source.
- (3) GROUP BY: Informing PostgreSQL that we are aggregating rows. The AVG() function requires aggregation of the data.
The above query returns the integer, “8”.
Another AVG example
1 2 3 4 5 6 7 | SELECT t_cat , Avg(i_dif) AS avg_of_dif FROM tbl_functions GROUP BY t_cat; |
Analysis
- (1) SELECT: Informs Postgres that we want an average of all values in the _idif column and to name the results to be avg_of_dif. The AVG function requires GROUP BY.
- (2) FROM: The Postgres source data is tbl_functions.
- (3) GROUP BY: Tells PostgreSQL that we want grouping of the _tcat column in tbl_functions.
The above query returns:
t_cat | avg_of_dif |
---|---|
Aggregate | 4 |
Singular | 2.5 |
As you can see, AVG(i_dif) returned an average of all the numbers in the i_dif column.
The Postgres SUM aggregate function
The PostgreSQL SUM() function gives us the sum of a group of numbers.
Sum function syntax
1 | i_sum_total := SUM(group_of_numeric_values); |
Sum function example
Our test dataset is a table called tbl_coding_languages.
id_tech | t_name | i_people_know_it |
---|---|---|
0 | Python | 6 |
1 | Java | 1 |
2 | C# | 3 |
3 | JavaScript | 4 |
The ipeople_know_it column represents the number of developers in your firm who know this language. We want to get a total of all the coders in the company who know these languages. This is where we use GROUP BY. You can think of it this way: If you want to sum _more than one number, you have to add up the group of numbers, since there is no point to retrieving the sum of one number as it will equal itself.
1 2 3 4 5 6 | SELECT SUM(i_people_know_it) FROM tbl_coding_languages GROUP BY i_people_know_it; |
Analysis
- (1) SELECT: Relay to Postgres that we want a sum of all numbers (there is no HAVING filter) in the _i_people_knowit column.
- (2) FROM: Specify _tbl_codinglanguages as our source for data.
- (3) GROUP BY: Instructs PostgreSQL that we are grouping. As mentioned above, the SUM() function REQUIRES aggregation of data.
The query returns “14”.
1 2 3 4 5 6 7 | SELECT t_cat , SUM(i_uses) AS sum_of_uses FROM tbl_functions GROUP BY t_cat; |
Output:
t_cat | sum_of_uses |
---|---|
Aggregate | 63 |
Singular | 119 |
Use HAVING to filter
What if we want to return only a “sum_of_uses” from “tbl_functions” based on when the function category is “Aggregate”? This is where we add the “HAVING” statement to filter, much like the way WHERE works:
1 2 3 4 5 6 7 8 9 | SELECT t_cat , SUM(i_uses) AS sum_of_uses FROM tbl_functions HAVING t_cat = 'Aggregate' GROUP BY t_cat; |
Returns:
t_cat | sum_of_uses |
---|---|
Aggregate | 63 |
Here we see the sum function added up all numbers in the i_uses column HAVING the value of “Aggregate” in the t_cat column of our PostgreSQL table.
Extras
- PostgreSQL’s ROLLUP is a GROUP BY subclause. It is used to distinguish multiple sets of columns you want aggregated.
- Short tutorial on naming conventions here.
Conclusion
We learned how and when to use Postgres aggregate functions such as AVG function and SUM function, which are aggregate functions for PostgreSQL. We also explored the “HAVING” clause in more than one example. Code samples and images included.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started