How to use Postgres Group By
Introduction
This article teaches how to use the Postgres Group By statement. During our lesson, we’ll explore use of the PostgreSQL HAVING clause, as well as the AVG function, COUNT function, MIN function, MAX function, and SUM function, which are all aggregate functions.
Prerequisites
- A base understanding of how to write SQL for Postgres.
- Some knowledge of the use of common SQL queries, including SELECT, FROM, and WHERE (which is like HAVING) clauses. If you have no exposure to “HAVING”, that is fine because it is part of what we are here to learn.
- Understanding of what function, integer, and string do and how they work.
- The meaning of the word “aggregate”:
Using PostgreSQL GROUP BY
The Postgres GROUP BY command groups a set of values so that we can enact group-level (aggregate) functions like Avg(), Bit_Or(), Bit_And(), Count(), Max(), Min(), and Sum().
The GROUP BY statement in PostgreSQL is used in conjunction with SELECT to group the rows (records) in a Postgres table that have a specific data profile. The purpose can be to return aggregates that apply to the group(s) and/or remove redundancy.
In Postgres, GROUP BY comes after the WHERE or HAVING clause and before the ORDER BY.
PostgreSQL GROUP BY syntax
1 2 3 4 5 6 7 8 9 10 11 | SELECT id_distinct , An_Aggregate_Function([name_column]) FROM tbl_name_of_table WHERE OR HAVING condition(s) GROUP BY id_distinct; ORDER BY [name_column TO sort]; |
Analysis of Group By syntax
In the above query, _Some_AggregateFunction() represents functions that act upon a set of values, as mentioned in the section above titled “Using PostgreSQL GROUP BY”.
HAVING vs WHERE
HAVING and WHERE occupy similar but different roles.
- HAVING is used to filter values from a group. This – instead of WHERE – is what you will typically use when using aggregate functions in your SQL.
- WHERE is used to filter records from a result. This occurs before any groupings are made.
Why and when will we use the GROUP BY clause?
When we need to group data, we use GROUP BY to either remove redundancy or utilize an aggregate function. Below, we will show examples. A simple rule is to use WHERE before GROUP BY and HAVING after GROUP BY. This will work for most situations.
Postgres Group By example
We’ve named our initial PostgreSQL table as tbl_functions:
t_name_function | t_category | t_operates_on | i_difficulty | i_uses |
---|---|---|---|---|
min | Aggregate | number | 4 | 8 |
max | Aggregate | number | 4 | 8 |
avg | Aggregate | number | 4 | 12 |
count | Aggregate | all | 4 | 14 |
sum | Aggregate | number | 4 | 20 |
char_length | Singular | string | 3 | 32 |
lower | Singular | string | 2 | 21 |
upper | Singular | string | 2 | 20 |
length | Singular | string | 3 | 45 |
Table columns above explanation
- _t_namefunction is the name we’ve stored in Postgres.
- _tcategory keeps track of whether the function in this row is for aggregate or singular use. If “Aggregate”, then we know this function uses GROUP BY.
- _t_operateson is whether this function’s parameter requires a string or number.
- _idifficulty is a measurement of the learning curve required to learn this function.
- _iuses is a percentage as integer of how often the function is used in the company’s code base.
Postgres Group By SQL
Now we will use GROUP BY on the table, leaving out any functions at first.
1 2 3 4 5 6 | SELECT t_category FROM tbl_functions GROUP BY t_category; |
The Postgres SQL above returns:
t_category |
---|
Singular |
Aggregate |
Why is that? With the SQL above, we told PostgreSQL:
- Look at the “t_category” column only.
- Return only unique values from that column.
Let’s go through the same process with another column in our table:
1 2 3 4 5 6 | SELECT t_operates_on FROM tbl_functions GROUP BY t_operates_on; |
The above SQL returns:
t_operates_on |
---|
number |
all |
string |
Note that in the above returned data, the duplicates in the _t_operateson column were aggregated?
Let’s increase the depth of your understanding by adding in aggregate functions, starting with just one. We’ll build the Postgres SQL to use the GROUP BY clause along with the AVG aggregate function to see the average i_difficulty to learn and use any of the functions in our company applications.
1 2 3 4 5 6 7 | SELECT t_category , Avg(i_difficulty) AS avg_of_difficulty FROM tbl_functions GROUP BY t_category; |
Analysis
- (1) SELECT: Relays to Postgres that we want an average of all values in the _idifficulty column and to name the results to be avg_of_difficulty. The AVG aggregate function requires GROUP BY.
- (2) FROM: The Postgres source data is tbl_functions.
- (3) GROUP BY: Tells PostgreSQL that we want grouping of the _tcategory column in tbl_functions.
The above query returns:
t_category | avg_of_difficulty |
---|---|
Aggregate | 4 |
Singular | 2.5 |
As you can see, AVG(i_difficulty) gave us an average of all the values in the i_difficulty column.
To answer a question that might be on your mind of “What is used more, Aggregate functions or Singular-type?” we’ll use another example – and instead of AVG – we will use another aggregate function; SUM.
1 2 3 4 5 6 7 | SELECT t_category , SUM(i_uses) AS sum_of_uses FROM tbl_functions GROUP BY t_category; |
Output:
t_category | sum_of_uses |
---|---|
Aggregate | 62 |
Singular | 118 |
Use HAVING to filter
What if we only want to return a “sum_of_uses” based on when the function category is “Aggregate”? This is where we add the “HAVING” clause:
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT t_category , SUM(i_uses) AS sum_of_uses FROM functions -- TWO NEW LINES OF SQL: HAVING t_category = 'Aggregate' GROUP BY t_category; |
Returns:
t_category | sum_of_uses |
---|---|
Aggregate | 62 |
Here we see the sum function added up all numbers in the i_uses column HAVING the value of “Aggregate” in the t_category column of our PostgreSQL table.
Extras
- You may wish to investigate Postgres’ ROLLUP, which is a GROUP BY subclause. It is used to define multiple sets of columns you want to group.
- We invite you to look for our other articles using PostgreSQL’s GROUP BY, along with related aggregate functions like AVG, COUNT, MIN, MAX, and SUM.
- Are you curious why we named some of our variables and columns with a prefix of “i” or “t“? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a short tutorial on that topic.
Conclusion
In this tutorial we learned how to use the Postgres Group By clause. We studied use of the PostgreSQL HAVING clause, as well as AVG and SUM functions, which are aggregate functions.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started