# Postgres Aggregate Functions

## Introduction

In this lesson we learn how to use the following Postgres aggregate functions: AVG function and SUM function, which are both aggregate functions in PostgreSQL. We also explore a statement required for filtering of aggregated data; “HAVING”.

## Prerequisites

- An understanding of how SQL queries fit into the Postgres database system and the means to execute them.
- Understanding of common query types and clauses, including SELECT, FROM, and HAVING (which is like WHERE but for aggregate data, as you will learn). If you have no understanding of how “HAVING” works, you have come to the right place as we’ll be learning it here.

Essential to understanding and using aggregate functions in Postgres is learning to GROUP or aggregate table data, so we will begin by studying how the GROUP BY command works.

## Postgres GROUP BY clause

The Postgres *GROUP BY* statement aggregates a set of rows so that we can use group-based (aggregate) functions like Avg, Count, Min, Max, and Sum. The GROUP BY statement is used with SELECT to group the records (rows) in a Postgres table or view that have a specific data “look”. The purpose can be to return values that apply to the group(s) and/or to remove dupes.

Note: GROUP BY is placed *after* the WHERE or HAVING statements and before ORDER BY.

## Syntax of Postgres GROUP BY

1 2 3 4 5 6 7 8 9 10 11 | SELECT value_distinct , Your_Aggregate_Function([name_of_column]) FROM tbl_table_name_here_ WHERE OR HAVING [SOME condition(s) met] GROUP BY value_distinct; ORDER BY [name_column TO sort]; |

Analysis

In the above Group By syntax example, _Your_Aggregate*Function()* represents functions that act upon a set of rows, as mentioned above.

### HAVING or WHERE?

*HAVING* and *WHERE* occupy similar but different roles. HAVING can be essential to filtering your aggregated records.

- We use HAVING to limit records in a group. This – instead of WHERE – is what you will typically use when using aggregate functions in your SQL.
- Use WHERE to limit records from any non-aggregated set. This occurs
*before*any groupings are made.

We’ll learn in a bit more detail about HAVING later in this tutorial.

### Why and when will we use the GROUP BY clause?

When we need to group data, we use *GROUP BY* to either remove redundancy or utilize an aggregate function. Below, we will show examples. A simple rule is to use WHERE before GROUP BY and HAVING after GROUP BY. This will work for *most* situations.

### Postgres Group By example

We created and filled the PostgreSQL table as “tbl_functions”:

t_name_function | t_cat | t_operates_on | i_difficulty | i_uses |
---|---|---|---|---|

min | Aggregate | number | 4 | 8 |

max | Aggregate | number | 4 | 8 |

avg | Aggregate | number | 4 | 12 |

count | Aggregate | all | 4 | 14 |

sum | Aggregate | number | 4 | 20 |

char_length | Singular | string | 3 | 32 |

Analysis

- _t_name
*function*is the name we’ve stored in Postgres. - _t
*cat*keeps track of whether the function in this row is for aggregate or singular use. If “Aggregate”, then we know this function uses GROUP BY. - _t_operates
*on*is whether this function’s parameter requires a string or number. - _i
*difficulty*is a measurement of the learning curve required to learn this function. - _i
*uses*is a percentage as integer of how often the function is used in the company’s code base.

### Postgres Group By SQL

Now we will use GROUP BY on the “tbl_functions” Postgres table, first choosing to leave out use of functions.

1 2 3 4 5 6 | SELECT t_cat FROM tbl_functions GROUP BY t_cat; |

The Postgres query above returns:

t_cat |
---|

Singular |

Aggregate |

Why is that? With the SQL above, we told PostgreSQL:

- Look at the “t_cat” column only.
- Return only unique values from that column.

Let’s go through the same process with another column in our table:

1 2 3 4 5 6 | SELECT t_operates_on FROM tbl_functions GROUP BY t_operates_on; |

The above SQL returns:

t_operates_on |
---|

number |

all |

string |

Note that in the above returned data, the duplicates in the _t_operates*on* column were aggregated?

Let’s increase the difficulty by adding in an aggregate function, starting with AVG. We’ll build the Postgres SQL to use the GROUP BY clause along with the AVG aggregate function to see the average i_difficulty to learn and use any of the functions in our company applications.

First, though:

## The Postgres AVG function

PostgreSQL’s AVG function returns the average of a group of numbers.

### Postgres AVG function syntax

1 | i_average := AVG(group_of_values); |

### Postgres AVG function example

Our initial dataset is a table called tbl_languages.

id_technology | t_name | i_dif |
---|---|---|

0 | Python | 8 |

1 | Java | 7 |

2 | C# | 9 |

3 | JavaScript | 8 |

The exercise is to find the average difficulty – see the “i_dif” column – of all four technology ages in the table you see above.

Note for the below sql example. This is where we use our knowledge of SQL GROUPing into the mix. You can think of it this way: If you want to average *more than one number*, you must look at the entire *group* of values.

1 2 3 4 5 6 | SELECT AVG(i_dif) FROM tbl_languages GROUP BY i_dif; |

How does the above work? We’ll go piece by piece:

- (1)
*SELECT*: We are telling Postgres’ SQL engine that we want an average of all values (because no WHERE statement) in the _i*dif*column. - (2)
*FROM*: Specifying _tbl*languages*as our data source. - (3)
*GROUP BY*: Informing PostgreSQL that we are aggregating rows. The AVG() function**requires**aggregation of the data.

The above query returns the integer, “8”.

### Another AVG example

1 2 3 4 5 6 7 | SELECT t_cat , Avg(i_dif) AS avg_of_dif FROM tbl_functions GROUP BY t_cat; |

Analysis

- (1) SELECT: Informs Postgres that we want an average of all values in the _i
*dif*column and to name the results to be avg_of_dif. The AVG function requires GROUP BY. - (2)
*FROM*: The Postgres source data is tbl_functions. - (3)
*GROUP BY*: Tells PostgreSQL that we want grouping of the _t*cat*column in tbl_functions.

The above query returns:

t_cat | avg_of_dif |
---|---|

Aggregate | 4 |

Singular | 2.5 |

As you can see, AVG(i_dif) returned an average of all the numbers in the i_dif column.

## The Postgres SUM aggregate function

The PostgreSQL SUM() function gives us the sum of a group of numbers.

### Sum function syntax

1 | i_sum_total := SUM(group_of_numeric_values); |

### Sum function example

Our test dataset is a table called **tbl_coding_languages**.

id_tech | t_name | i_people_know_it |
---|---|---|

0 | Python | 6 |

1 | Java | 1 |

2 | C# | 3 |

3 | JavaScript | 4 |

The i*people_know_it column represents the number of developers in your firm who know this language. We want to get a total of all the coders in the company who know these languages. This is where we use GROUP BY. You can think of it this way: If you want to sum _more than one number*, you have to add up the *group* of numbers, since there is no point to retrieving the sum of one number as it will equal itself.

1 2 3 4 5 6 | SELECT SUM(i_people_know_it) FROM tbl_coding_languages GROUP BY i_people_know_it; |

Analysis

- (1)
*SELECT*: Relay to Postgres that we want a sum of*all*numbers (there is no HAVING filter) in the _i_people_know*it*column. - (2)
*FROM*: Specify _tbl_coding*languages*as our source for data. - (3)
*GROUP BY*: Instructs PostgreSQL that we are grouping. As mentioned above, the SUM() function*REQUIRES*aggregation of data.

The query returns “14”.

1 2 3 4 5 6 7 | SELECT t_cat , SUM(i_uses) AS sum_of_uses FROM tbl_functions GROUP BY t_cat; |

Output:

t_cat | sum_of_uses |
---|---|

Aggregate | 63 |

Singular | 119 |

## Use HAVING to filter

What if we want to return only a “sum_of_uses” from “tbl_functions” based on when the function category is “Aggregate”? This is where we add the “HAVING” statement to filter, much like the way WHERE works:

1 2 3 4 5 6 7 8 9 | SELECT t_cat , SUM(i_uses) AS sum_of_uses FROM tbl_functions HAVING t_cat = 'Aggregate' GROUP BY t_cat; |

Returns:

t_cat | sum_of_uses |
---|---|

Aggregate | 63 |

Here we see the sum function added up all numbers in the i_uses column HAVING the value of “Aggregate” in the t_cat column of our PostgreSQL table.

## Extras

- PostgreSQL’s ROLLUP is a GROUP BY subclause. It is used to distinguish multiple sets of columns you want aggregated.
- Short tutorial on naming conventions here.

## Conclusion

We learned how and when to use Postgres aggregate functions such as AVG function and SUM function, which are aggregate functions for PostgreSQL. We also explored the “HAVING” clause in more than one example. Code samples and images included.

## Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started