How to use Postgres Group By

Introduction

This article teaches how to use the Postgres Group By statement. During our lesson, we’ll explore use of the PostgreSQL HAVING clause, as well as the AVG function, COUNT function, MIN function, MAX function, and SUM function, which are all aggregate functions.

Prerequisites

  • A base understanding of how to write SQL for Postgres.
  • Some knowledge of the use of common SQL queries, including SELECT, FROM, and WHERE (which is like HAVING) clauses. If you have no exposure to “HAVING”, that is fine because it is part of what we are here to learn.
  • Understanding of what function, integer, and string do and how they work.
  • The meaning of the word “aggregate”:

Using PostgreSQL GROUP BY

The Postgres GROUP BY command groups a set of values so that we can enact group-level (aggregate) functions like Avg(), Bit_Or(), Bit_And(), Count(), Max(), Min(), and Sum().

The GROUP BY statement in PostgreSQL is used in conjunction with SELECT to group the rows (records) in a Postgres table that have a specific data profile. The purpose can be to return aggregates that apply to the group(s) and/or remove redundancy.

In Postgres, GROUP BY comes after the WHERE or HAVING clause and before the ORDER BY.

PostgreSQL GROUP BY syntax

SELECT
    id_distinct
    , An_Aggregate_Function([name_column])
FROM
    tbl_name_of_table
WHERE OR HAVING
    condition(s)
GROUP BY
    id_distinct;
ORDER BY
    [name_column TO sort];

Analysis of Group By syntax

In the above query, _Some_AggregateFunction() represents functions that act upon a set of values, as mentioned in the section above titled “Using PostgreSQL GROUP BY”.

HAVING vs WHERE

HAVING and WHERE occupy similar but different roles.

  • HAVING is used to filter values from a group. This – instead of WHERE – is what you will typically use when using aggregate functions in your SQL.
  • WHERE is used to filter records from a result. This occurs before any groupings are made.

Why and when will we use the GROUP BY clause?

When we need to group data, we use GROUP BY to either remove redundancy or utilize an aggregate function. Below, we will show examples. A simple rule is to use WHERE before GROUP BY and HAVING after GROUP BY. This will work for most situations.

Postgres Group By example

We’ve named our initial PostgreSQL table as tbl_functions:

t_name_functiont_categoryt_operates_oni_difficultyi_uses
minAggregatenumber48
maxAggregatenumber48
avgAggregatenumber412
countAggregateall414
sumAggregatenumber420
char_lengthSingularstring332
lowerSingularstring221
upperSingularstring220
lengthSingularstring345

Table columns above explanation

  • _t_namefunction is the name we’ve stored in Postgres.
  • _tcategory keeps track of whether the function in this row is for aggregate or singular use. If “Aggregate”, then we know this function uses GROUP BY.
  • _t_operateson is whether this function’s parameter requires a string or number.
  • _idifficulty is a measurement of the learning curve required to learn this function.
  • _iuses is a percentage as integer of how often the function is used in the company’s code base.

Postgres Group By SQL

Now we will use GROUP BY on the table, leaving out any functions at first.

SELECT
    t_category
FROM
    tbl_functions
GROUP BY
    t_category;

The Postgres SQL above returns:

t_category
Singular
Aggregate

Why is that? With the SQL above, we told PostgreSQL:

  • Look at the “t_category” column only.
  • Return only unique values from that column.

Let’s go through the same process with another column in our table:

SELECT
    t_operates_on
FROM
    tbl_functions
GROUP BY
    t_operates_on;

The above SQL returns:

t_operates_on
number
all
string

Note that in the above returned data, the duplicates in the _t_operateson column were aggregated?

Let’s increase the depth of your understanding by adding in aggregate functions, starting with just one. We’ll build the Postgres SQL to use the GROUP BY clause along with the AVG aggregate function to see the average i_difficulty to learn and use any of the functions in our company applications.

SELECT
    t_category
    , Avg(i_difficulty) AS avg_of_difficulty
FROM
    tbl_functions
GROUP BY
    t_category;

Analysis

  • (1) SELECT: Relays to Postgres that we want an average of all values in the _idifficulty column and to name the results to be avg_of_difficulty. The AVG aggregate function requires GROUP BY.
  • (2) FROM: The Postgres source data is tbl_functions.
  • (3) GROUP BY: Tells PostgreSQL that we want grouping of the _tcategory column in tbl_functions.

The above query returns:

t_categoryavg_of_difficulty
Aggregate4
Singular2.5

As you can see, AVG(i_difficulty) gave us an average of all the values in the i_difficulty column.

To answer a question that might be on your mind of “What is used more, Aggregate functions or Singular-type?” we’ll use another example – and instead of AVG – we will use another aggregate function; SUM.

SELECT
    t_category
    , SUM(i_uses) AS sum_of_uses
FROM
    tbl_functions
GROUP BY
    t_category;

Output:

t_categorysum_of_uses
Aggregate62
Singular118

Use HAVING to filter

What if we only want to return a “sum_of_uses” based on when the function category is “Aggregate”? This is where we add the “HAVING” clause:

SELECT
    t_category
    , SUM(i_uses) AS sum_of_uses
FROM
    functions

-- TWO NEW LINES OF SQL:
HAVING
    t_category = 'Aggregate'

GROUP BY
    t_category;

Returns:

t_categorysum_of_uses
Aggregate62

Here we see the sum function added up all numbers in the i_uses column HAVING the value of “Aggregate” in the t_category column of our PostgreSQL table.

Extras

  • You may wish to investigate Postgres’ ROLLUP, which is a GROUP BY subclause. It is used to define multiple sets of columns you want to group.
  • We invite you to look for our other articles using PostgreSQL’s GROUP BY, along with related aggregate functions like AVG, COUNT, MIN, MAX, and SUM.
  • Are you curious why we named some of our variables and columns with a prefix of “i” or “t“? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a short tutorial on that topic.

Conclusion

In this tutorial we learned how to use the Postgres Group By clause. We studied use of the PostgreSQL HAVING clause, as well as AVG and SUM functions, which are aggregate functions.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.