Using Group By Clause in Postgres SQL
In this article, we’ll explore how to use the Group By clause in Postgres to group a set of values. During this tutorial, we’ll use the following structure:
- What? What does the
GROUP BYstatement do when fed a dataset?
- How? How do we best use this clause in our PostgreSQL SQL scripts?
- Why? When would we make use of this statement? We’ll learn via using a realistic use case. Includes use of Postgres’ “HAVING” clause, which is almost identical to the “WHERE” clause. We’ll also include examples where we use the AVG() and SUM() aggregate functions.
- Comprehension of the use of common PL/SQL statements, including
HAVINGclauses. If you have no exposure to “HAVING” so far, it’s okay; we will be covering it here enough for the basic examples included.
- Knowledge of what function, integer, and string are and how they work on a beginner level.
What is the GROUP BY clause?
GROUP BY command groups a set of values so that we can enact group-level (aggregate) functions like Avg(), Bit_And(), Bit_Or(), Count(), Max(), Min(), Sum(), etc.
The GROUP BY statement in PostgreSQL is used in conjunction with SELECT to group the rows in a table that have identical data. The purpose is remove redundancy in the output and/or return aggregates that apply to the group(s).
In your SQL, the GROUP BY clause comes after the WHERE or HAVING clause and comes before the ORDER BY clause.
How do we use the Group By Statement in SQL?
[column_name TO sort ON];
In the above query,
Some_Aggregate_Function() represents functions that act upon a set of values, as mentioned in the section above titled “How do we use the Group By Statement in SQL?”.
On WHERE vs HAVING – A quick explanation
HAVING perform similar but different tasks.
- WHERE clauses are used to filter records from a result. The filter occurs before any groupings are made.
- The HAVING clause is used to filter values from a group. This is what you will typically use when using aggregate functions in your SQL.
Why and when will we use the GROUP BY clause?
Any time we need to group data that is the same. We use
GROUP BY to either remove redundancy or utilize an aggregate function. Below, we will show examples.
Starting with an easy example
We’ve named our initial dataset as a table called functions:
Explanation of the columns in the table above
- _t_namefunction is the name of the function.
- _t_categoryfunction tells us if this function is designed for singular or aggregate input. If “Aggregate,” then we know this is a function that requires the GROUP BY clause.
- _t_operateson is whether we can feed this function a string, a number, or either/other.
- _idifficulty is the difficulty to learn and use this function.
- _iuses is an percentage (expressed as an integer here) of how often the function is used in company-wide code.
Now we will use GROUP BY on the above table. For now, we’ll leave out any functions.
The above SQL returns:
Why is that?
- We asked for the Postgres database to look at one column; “t_category_function”.
- We commanded “Look at the data in that column and show only unique values.
With repetition often comes deeper understanding, so let’s do the same with another column in our table:
The above SQL returns:
Notice how the duplicates in the
t_operates_onfield (column) were aggregated?
OK, so now you have seen two examples of a very simplistic use of GROUP BY. Let’s amp up the difficulty just a bit by adding in an aggregate function. We’ll write the SQL script to use the
GROUP BY method along with the
AVG aggregate function to learn the average difficulty to learn and use any of the functions in our table.
, Avg(i_difficulty) AS avg_of_difficulty
Let’s analyze how the SQL code we just wrote above operates:
SELECT: We are relaying to Postgres’ SQL interpretor that we want an average of all values (all, because there is no HAVING statement) in the
i_difficultyfield (column) and naming the output to be
avg_of_difficulty. The AVG function requires GROUPing.
FROM: Our data source is the
GROUP BY: Telling Postgres that we are grouping the
The above query returns:
Briefly, we’ll show another example, but using SUM instead of AVG, in order to answer the question, “What is used more, Singular-type functions or Aggregate-type functions?
, SUM(i_uses) AS sum_of_uses
Use HAVING to filter our recordset
What if we only want to return a “sum_of_uses” based on when the function category is “Aggregate”? This is where we add the “HAVING” clause:
, SUM(i_uses) AS sum_of_uses
-- TWO NEW LINES OF SQL:
t_category_function = 'Aggregate'
- Please feel free to seek out our other articles that use the
GROUP BYPostgreSQL clause, along with related aggregate functions like AVG, COUNT, MIN, MAX, and SUM.
- Are you curious why we named some of our variables and columns with a prefix of “i” or “t“? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a short tutorial on that topic.
We learned here how to use the
GROUP BY statement in PostgreSQL. We used a real world example, so as to ease understanding of the many ways GROUP BY can be used when combined with the HAVING clause and the AVG and SUM functions.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started