Using Group By Clause in Postgres SQL

Introduction

In this article, we’ll explore how to use the Group By clause in Postgres to group a set of values. During this tutorial, we’ll use the following structure:

  • What? What does the GROUP BY statement do when fed a dataset?
  • How? How do we best use this clause in our PostgreSQL SQL scripts?
  • Why? When would we make use of this statement? We’ll learn via using a realistic use case. Includes use of Postgres’ “HAVING” clause, which is almost identical to the “WHERE” clause. We’ll also include examples where we use the AVG() and SUM() aggregate functions.

Prerequisites

  • A beginner’s understanding of writing SQL for Postgres (or similar relational-type databases like MS SQL Server, MySQL, Oracle, etc.) using the PG Administration tool, another relational db admin tool, or by with script or programming languages (compiled or not, object-oriented or not) like Python, Java, Javascript, C#, PHP, ASP.Net, VB.Net, Ruby, Node, B4X, etc.) that provide a database connection, as well as a method for sending PL/SQL queries to our database tables, to get data or make changes to your data.
  • Comprehension of the use of common PL/SQL statements, including SELECT, FROM, and HAVING clauses. If you have no exposure to “HAVING” so far, it’s okay; we will be covering it here enough for the basic examples included.
  • Knowledge of what function, integer, and string are and how they work on a beginner level.

What is the GROUP BY clause?

The Postgres GROUP BY command groups a set of values so that we can enact group-level (aggregate) functions like Avg(), Bit_And(), Bit_Or(), Count(), Max(), Min(), Sum(), etc.

The GROUP BY statement in PostgreSQL is used in conjunction with SELECT to group the rows in a table that have identical data. The purpose is remove redundancy in the output and/or return aggregates that apply to the group(s).

In your SQL, the GROUP BY clause comes after the WHERE or HAVING clause and comes before the ORDER BY clause.

How do we use the Group By Statement in SQL?

Syntax

SELECT
    id_not_unique
    , Some_Aggregate_Function(i_measurement)
FROM
    [TABLE_NAME]
WHERE/HAVING
    [conditions]
GROUP BY
    id_not_unique;
ORDER BY
    [column_name TO sort ON];

Some analysis

In the above query, Some_Aggregate_Function() represents functions that act upon a set of values, as mentioned in the section above titled “How do we use the Group By Statement in SQL?”.

On WHERE vs HAVING – A quick explanation

WHERE and HAVING perform similar but different tasks.

  • WHERE clauses are used to filter records from a result. The filter occurs before any groupings are made.
  • The HAVING clause is used to filter values from a group. This is what you will typically use when using aggregate functions in your SQL.

Why and when will we use the GROUP BY clause?

Any time we need to group data that is the same. We use GROUP BY to either remove redundancy or utilize an aggregate function. Below, we will show examples.

Starting with an easy example

We’ve named our initial dataset as a table called functions:

t_name_functiont_category_functiont_operates_oni_difficultyi_uses
char_lengthSingularstring332
lengthSingularstring345
lowerSingularstring221
upperSingularstring220
avgAggregatenumber412
countAggregateall414
minAggregatenumber48
maxAggregatenumber48
sumAggregatenumber420

Explanation of the columns in the table above

  • _t_namefunction is the name of the function.
  • _t_categoryfunction tells us if this function is designed for singular or aggregate input. If “Aggregate,” then we know this is a function that requires the GROUP BY clause.
  • _t_operateson is whether we can feed this function a string, a number, or either/other.
  • _idifficulty is the difficulty to learn and use this function.
  • _iuses is an percentage (expressed as an integer here) of how often the function is used in company-wide code.

Now we will use GROUP BY on the above table. For now, we’ll leave out any functions.

SELECT
    t_category_function
FROM
    functions
GROUP BY
    t_category_function;

The above SQL returns:

t_category_function
Singular
Aggregate

Why is that?

  • We asked for the Postgres database to look at one column; “t_category_function”.
  • We commanded “Look at the data in that column and show only unique values.

With repetition often comes deeper understanding, so let’s do the same with another column in our table:

SELECT
    t_operates_on
FROM
    functions
GROUP BY
    t_operates_on;

The above SQL returns:

t_operates_on
all
number
string

Notice how the duplicates in the t_operates_on field (column) were aggregated?

OK, so now you have seen two examples of a very simplistic use of GROUP BY. Let’s amp up the difficulty just a bit by adding in an aggregate function. We’ll write the SQL script to use the GROUP BY method along with the AVG aggregate function to learn the average difficulty to learn and use any of the functions in our table.

SELECT
    t_category_function
    , Avg(i_difficulty) AS avg_of_difficulty
FROM
    functions
GROUP BY
    t_category_function;

Let’s analyze how the SQL code we just wrote above operates:

  • (1) SELECT: We are relaying to Postgres’ SQL interpretor that we want an average of all values (all, because there is no HAVING statement) in the i_difficulty field (column) and naming the output to be AS avg_of_difficulty. The AVG function requires GROUPing.
  • (2) FROM: Our data source is the functions table.
  • (3) GROUP BY: Telling Postgres that we are grouping the t_category_function table.

The above query returns:

t_category_functionavg_of_difficulty
Aggregate4
Singular2.5

Briefly, we’ll show another example, but using SUM instead of AVG, in order to answer the question, “What is used more, Singular-type functions or Aggregate-type functions?

SELECT
    t_category_function
    , SUM(i_uses) AS sum_of_uses
FROM
    functions
GROUP BY
    t_category_function;

Returns:

t_category_functionsum_of_uses
Aggregate62
Singular118

Use HAVING to filter our recordset

What if we only want to return a “sum_of_uses” based on when the function category is “Aggregate”? This is where we add the “HAVING” clause:

SELECT
    t_category_function
    , SUM(i_uses) AS sum_of_uses
FROM
    functions

-- TWO NEW LINES OF SQL:
HAVING
    t_category_function = 'Aggregate'

GROUP BY
    t_category_function;

Returns:

t_category_functionsum_of_uses
Aggregate62

Miscellany

  • Please feel free to seek out our other articles that use the GROUP BY PostgreSQL clause, along with related aggregate functions like AVG, COUNT, MIN, MAX, and SUM.
  • Are you curious why we named some of our variables and columns with a prefix of “i” or “t“? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a short tutorial on that topic.

Conclusion

We learned here how to use the GROUP BY statement in PostgreSQL. We used a real world example, so as to ease understanding of the many ways GROUP BY can be used when combined with the HAVING clause and the AVG and SUM functions.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.