Using Group By Clause in Postgres SQL
Introduction
In this article, we’ll explore how to use the Group By clause in Postgres to group a set of values. During this tutorial, we’ll use the following structure:
- What? What does the
GROUP BY
statement do when fed a dataset? - How? How do we best use this clause in our PostgreSQL SQL scripts?
- Why? When would we make use of this statement? We’ll learn via using a realistic use case. Includes use of Postgres’ “HAVING” clause, which is almost identical to the “WHERE” clause. We’ll also include examples where we use the AVG() and SUM() aggregate functions.
Prerequisites
- A beginner’s understanding of writing SQL for Postgres (or similar relational-type databases like MS SQL Server, MySQL, Oracle, etc.) using the PG Administration tool, another relational db admin tool, or by with script or programming languages (compiled or not, object-oriented or not) like Python, Java, Javascript, C#, PHP, ASP.Net, VB.Net, Ruby, Node, B4X, etc.) that provide a database connection, as well as a method for sending PL/SQL queries to our database tables, to get data or make changes to your data.
- Comprehension of the use of common PL/SQL statements, including
SELECT
,FROM
, andHAVING
clauses. If you have no exposure to “HAVING” so far, it’s okay; we will be covering it here enough for the basic examples included. - Knowledge of what function, integer, and string are and how they work on a beginner level.
What is the GROUP BY clause?
The Postgres GROUP BY
command groups a set of values so that we can enact group-level (aggregate) functions like Avg(), Bit_And(), Bit_Or(), Count(), Max(), Min(), Sum(), etc.
The GROUP BY statement in PostgreSQL is used in conjunction with SELECT to group the rows in a table that have identical data. The purpose is remove redundancy in the output and/or return aggregates that apply to the group(s).
In your SQL, the GROUP BY clause comes after the WHERE or HAVING clause and comes before the ORDER BY clause.
How do we use the Group By Statement in SQL?
Syntax
1 2 3 4 5 6 7 8 9 10 11 | SELECT id_not_unique , Some_Aggregate_Function(i_measurement) FROM [TABLE_NAME] WHERE/HAVING [conditions] GROUP BY id_not_unique; ORDER BY [column_name TO sort ON]; |
Some analysis
In the above query, Some_Aggregate_Function()
represents functions that act upon a set of values, as mentioned in the section above titled “How do we use the Group By Statement in SQL?”.
On WHERE vs HAVING – A quick explanation
WHERE
and HAVING
perform similar but different tasks.
- WHERE clauses are used to filter records from a result. The filter occurs before any groupings are made.
- The HAVING clause is used to filter values from a group. This is what you will typically use when using aggregate functions in your SQL.
Why and when will we use the GROUP BY clause?
Any time we need to group data that is the same. We use GROUP BY
to either remove redundancy or utilize an aggregate function. Below, we will show examples.
Starting with an easy example
We’ve named our initial dataset as a table called functions:
t_name_function | t_category_function | t_operates_on | i_difficulty | i_uses |
---|---|---|---|---|
char_length | Singular | string | 3 | 32 |
length | Singular | string | 3 | 45 |
lower | Singular | string | 2 | 21 |
upper | Singular | string | 2 | 20 |
avg | Aggregate | number | 4 | 12 |
count | Aggregate | all | 4 | 14 |
min | Aggregate | number | 4 | 8 |
max | Aggregate | number | 4 | 8 |
sum | Aggregate | number | 4 | 20 |
Explanation of the columns in the table above
- _t_namefunction is the name of the function.
- _t_categoryfunction tells us if this function is designed for singular or aggregate input. If “Aggregate,” then we know this is a function that requires the GROUP BY clause.
- _t_operateson is whether we can feed this function a string, a number, or either/other.
- _idifficulty is the difficulty to learn and use this function.
- _iuses is an percentage (expressed as an integer here) of how often the function is used in company-wide code.
Now we will use GROUP BY on the above table. For now, we’ll leave out any functions.
1 2 3 4 5 6 | SELECT t_category_function FROM functions GROUP BY t_category_function; |
The above SQL returns:
t_category_function |
---|
Singular |
Aggregate |
Why is that?
- We asked for the Postgres database to look at one column; “t_category_function”.
- We commanded “Look at the data in that column and show only unique values.
With repetition often comes deeper understanding, so let’s do the same with another column in our table:
1 2 3 4 5 6 | SELECT t_operates_on FROM functions GROUP BY t_operates_on; |
The above SQL returns:
t_operates_on |
---|
all |
number |
string |
Notice how the duplicates in the
t_operates_on
field (column) were aggregated?
OK, so now you have seen two examples of a very simplistic use of GROUP BY. Let’s amp up the difficulty just a bit by adding in an aggregate function. We’ll write the SQL script to use the GROUP BY
method along with the AVG
aggregate function to learn the average difficulty to learn and use any of the functions in our table.
1 2 3 4 5 6 7 | SELECT t_category_function , Avg(i_difficulty) AS avg_of_difficulty FROM functions GROUP BY t_category_function; |
Let’s analyze how the SQL code we just wrote above operates:
- (1)
SELECT
: We are relaying to Postgres’ SQL interpretor that we want an average of all values (all, because there is no HAVING statement) in thei_difficulty
field (column) and naming the output to beAS
avg_of_difficulty
. The AVG function requires GROUPing. - (2)
FROM
: Our data source is thefunctions
table. - (3)
GROUP BY
: Telling Postgres that we are grouping thet_category_function
table.
The above query returns:
t_category_function | avg_of_difficulty |
---|---|
Aggregate | 4 |
Singular | 2.5 |
Briefly, we’ll show another example, but using SUM instead of AVG, in order to answer the question, “What is used more, Singular-type functions or Aggregate-type functions?
1 2 3 4 5 6 7 | SELECT t_category_function , SUM(i_uses) AS sum_of_uses FROM functions GROUP BY t_category_function; |
Returns:
t_category_function | sum_of_uses |
---|---|
Aggregate | 62 |
Singular | 118 |
Use HAVING to filter our recordset
What if we only want to return a “sum_of_uses” based on when the function category is “Aggregate”? This is where we add the “HAVING” clause:
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT t_category_function , SUM(i_uses) AS sum_of_uses FROM functions -- TWO NEW LINES OF SQL: HAVING t_category_function = 'Aggregate' GROUP BY t_category_function; |
Returns:
t_category_function | sum_of_uses |
---|---|
Aggregate | 62 |
Miscellany
- Please feel free to seek out our other articles that use the
GROUP BY
PostgreSQL clause, along with related aggregate functions like AVG, COUNT, MIN, MAX, and SUM. - Are you curious why we named some of our variables and columns with a prefix of “i” or “t“? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a short tutorial on that topic.
Conclusion
We learned here how to use the GROUP BY
statement in PostgreSQL. We used a real world example, so as to ease understanding of the many ways GROUP BY can be used when combined with the HAVING clause and the AVG and SUM functions.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started