Group By in TimescaleDB

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

In this lesson document, we’ll investigate use of Group By in TimescaleDB. During this lesson, we’ll use the following structure:

  • What does GROUP BY in SQL look like and do in TimescaleDB databases?
  • How do we use GROUP BY in our TimescaleDB SQL scripts?
  • When in TimescaleDB do we use the GROUP BY statement? We will answer this by focused on a realistic use case. Includes use of TimescaleDB’s HAVING SQL clause, which is almost identical to the WHERE SQL clause. In addition to that, we’ll include some examples where you may notice we make use of the AVG and SUM aggregate functions.

Prerequisites

  • A beginner or novice level of understanding of building SQL queries to add, remove, or view data from a TimescaleDB database (or similar databases like MS Access, SQL Server, Postgres, Oracle, etc.) using the DBeaver tool, another relational db admin tool, and/or with script or languages like Python, Java, Node, VB.Net, C#, PHP, ASP.Net, B4a, B4i, etc. that provide us with a database connection, as well as a method for executing SQL on the database tables.
  • Comprehension of common SQL statements, including SELECT and FROM, as well as clauses like WHERE and HAVING, and operators that include LIKE. If you have no exposure up to now with the HAVING clause, it’s okay, as we’ll be learning about it here.

The first thing to do is set up a database instance. We prefer using Objectrocket because we like the amount of value and efficiency they provide.

Create a TimescaleDB database instance on ObjectRocket

  • (1) Go to kb.objectrocket.com and “Log In”.
  • (2) Use left menu to “Create Instance”.
  • (3) In the “Name” field, add a descriptive name for your instance.
  • (4) Under “Select your service” choose the system you want to use from “CockroachDB, Elasticsearch, PostgreSQL, Redis, TimescaleDB, and MongoDB”.
  • (5) Choose the “Cloud Provider” and “Type” you want.
  • (6) Select “Version” and “Region”.
  • (7) Click the “GO TO STEP 2” button.
  • (8) Make a choice in the “Pick Your Flavor” section.
  • (9) Choose the “Capacity (Storage/Memory)”. Notice how this choice influences your total fee at the bottom right.
  • (10) Click the “GO TO STEP 3” button.
  • (11) Under “Add a Whitelist IP” pick “ALOW ANY IP” or “USE MY IP” or “Add my IPs later”.
  • (12) Choose between “Master” and “Replica” and click the “ADD” button.
  • (13) Now click the “CREATE INSTANCE” button at the bottom.

Before we start building queries, let’s look at how to use Flask’s psycopg2 to connect to a TimescaleDB instance at ObjectRocket.

Connect to TimescaleDB

1
2
3
4
5
6
7
8
9
t_dbname = "myTSdb"
t_name_user = "tsdbadmin"
t_password = "secret"
t_sslmode = "require"
t_host = "ingress.hkybrhnz.launchpad.objectrocket.cloud"
t_port = "4129"
connection_string = "dbname=" & t_dbname & " user=" & t_name_user & " password=" & t_password & " host=" & t_host & " port=" & t_port & " sslmode=" & sslmode
db_conn = psycopg2.connect(connection_string)
db_cursor = db_conn.cursor()

What’s GROUP BY?

The TimescaleDB GROUP BY command creates a grouping of a set of values so that you can use group-level (aggregate) functions like Avg, Bit_And, Bit_Or, Count, Max, Min, and Sum upon the data.

You can look at it this way: the GROUP BY statement in TimescaleDB is used along with SELECT to group records in a table that have the same data. The purpose here is to remove redundancy in the output and/or to return aggregates that apply to the grouped data.

In your TimescaleDB SQL, the GROUP BY statement comes after the WHERE or HAVING clause and comes before the ORDER BY statement.

Using Group By

Syntax of Group By

1
2
3
4
5
6
7
8
9
10
11
SELECT
    idNotUnique
    , Some_Aggregate_Func(intMeasurement)
FROM
    [tbl_name]
WHERE/HAVING
    [my_conditions]
GROUP BY
    idNotUnique;
ORDER BY
    [colNameToSortOn];

Analysis of Group By syntax:

In the above SQL, “Some_Aggregate_Func” represents a function that acts upon a group of values, as mentioned in the prior section.

WHERE vs HAVING

HAVING and WHERE perform equivalent but different roles.

  • The HAVING statement is for filtering values from a set. This is what you’ll typically use when using aggregate functions in your SQL. A GROUP BY is required.
  • WHERE clauses are used to filter rows from a result. This filtering happens before any groupings are made. This statement can be used with or without GROUP BY being in the overall SQL script.

Why use GROUP BY?

Any time we want to group data that has common values. We use GROUP BY to either remove redundancy or make use of an aggregate function like SUM, for example. Below, are more examples.

Aggregate functions

We have named the following dataset as a table called tblFunctions:

txtNameFunctiontxtCategorytxtOperatesOnintDifficultyintUses
char_lengthSingularstring432
lengthSingularstring445
lowerSingularstring321
upperSingularstring320
avgAggregatenumber512
countAggregateall514
minAggregatenumber58
maxAggregatenumber58
sumAggregatenumber520

Analysis of the columns and records in the above table:

  • txtNameFunction is the name of the function.
  • txtCategory instructs if this function’s purpose is for singular or aggregates. If “Aggregate,” then we understand this is a function that needs a GROUP BY clause be in the SQL script.
  • txtOperatesOn is whether we can feed this function a string, a number, or “all” (both).
  • intDifficulty is the difficulty to learn, know, and make use of the function.
  • intUses is a percentage value (expressed as an integer) of how often the function is used in code throughout the company’s applications.

Next we’ll use Timescale’s GROUP BY on the above table and for now, we’ll be leaving out any functions.

Group By example

1
2
3
SELECT txtCategory
FROM tblFunctions
GROUP BY txtCategory;

The above TimescaleDB query returns:

txtCategory
Singular
Aggregate

Why are we learning no numbers in the rowset returned?

  • We instructed the TimescaleDB database to study only one column, which is “txtCategory”.
  • We pretty much said “Study the data in that one column and show only the unique values.

With some repetition, this will all become more clear, so now we will do the same with another column in our table:

1
2
3
4
5
6
SELECT
    txtOperatesOn
FROM
    tblFunctions
GROUP BY
    txtOperatesOn;

The above TimescaleDB query returns:

txtOperatesOn
all
number
string

Notice how duplicate values in the txtOperatesOn column were aggregated?

Now you have seen two examples of a super simple use of GROUP BY. Now we’ll increase the difficulty just a little by adding in an aggregate function to glean some obvious use out of this situation. We will write the SQL to use the GROUP BY method along with the added AVG function to see the average difficulty of all the functions in the TimescaleDB table we named tblFunctions.

1
2
3
4
5
6
7
SELECT
    txtCategory
    , Avg(intDifficulty) AS numAvgDifficulty
FROM
    tblFunctions
GROUP BY
    txtCategory;

Now to analyze what the Timescale SQL script we wrote above attaines:

  • SELECT: We’re relaying to TimescaleDB’s interpretor that we need to retrieve an average of all the values (because there is no HAVING statement) in the intDifficulty column, while naming the output to be AS numAvgDifficulty. The AVG function needs GROUP BY.
  • FROM: Our TimescaleDB data source is the tblFunctions table.
  • GROUP BY: Instructs TimescaleDB that we’re grouping the txtCategory column.

The above query returns:

txtCategorynumAvgDifficulty
Aggregate4
Singular2.5

Next, we’ll show another example, but now using the SUM function instead of the AVG function (both aggregate), in order to find the answer to the question, “What’s used more often in our company, Singular functions or Aggregate functions?

1
2
3
4
5
6
7
SELECT
    txtCategory
    , SUM(intUses) AS intSumOfUses
FROM
    tblFunctions
GROUP BY
    txtCategory;

Returns:

txtCategoryintSumOfUses
Aggregate62
Singular118

HAVING to filter data

What if we only need to return intSumOfUses based on when the function’s category is “Aggregate”? To do so, we add the “HAVING” clause:

1
2
3
4
5
6
7
8
9
10
11
SELECT
    txtCategory
    , SUM(intUses) AS intSumOfUses
FROM
    tblFunctions

-- 2 NEW LINES OF script:
HAVING
    txtCategory = 'Aggregate'
GROUP BY
    txtCategory;

Returns:

txtCategoryintSumOfUses
Aggregate62

That is all for now. Feel free to look for our related documents that use the GROUP BY TimescaleDB clause, along with related aggregate functions like AVG, MIN, MAX, COUNT, and SUM.

Conclusion

We saw here how to use the GROUP BY statement in TimescaleDB. We investigated syntax and realistic examples, so as to increase your understanding of the many ways GROUP BY can be used when combined with the HAVING clause and the AVG and SUM functions. Code samples were included throughout this document. Code samples included.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.