Group By in CockroachDB

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

In this lesson document, we will investigate use of Group By in CockroachDB. During this lesson, we will use the following structure:

  • What does GROUP BY in SQL look like and do in Cockroach databases?
  • How do we use GROUP BY in our CockroachDB SQL scripts?
  • When in Cockroach do we use the GROUP BY statement? We’ll answer this by looking at a realistic use case. Includes use of Cockroach’s HAVING SQL clause, which is almost identical to the WHERE SQL clause. In addition to that, we will include some examples where you may notice we make use of the AVG and SUM aggregate functions.

Prerequisites

  • A novice or beginner level of knowledge of building SQL queries to add, remove, or view data from a Cockroach database (or similar databases like MS Access, SQL Server, Postgres, Oracle, etc.) using the DBeaver tool, another relational db admin tool, and/or with script or languages like Python, Java, Node, VB.Net, C#, PHP, ASP.Net, B4a, B4i, etc. that provide us with a database connection, as well as a method for executing SQL on our database tables.
  • Comprehension of common SQL statements, including SELECT and FROM, as well as clauses like WHERE and HAVING, and operators that include LIKE. If you have no exposure up to now with the HAVING clause, it’s okay, as we will be learning about it here.

What is GROUP BY?

The Cockroach GROUP BY command creates a grouping of a set of values so that you can use group-level (aggregate) functions like Avg, Bit_And, Bit_Or, Count, Max, Min, and Sum upon the data.

The GROUP BY statement in CockroachDB is used along with SELECT to group rows in a table that have the same data. The purpose here is to remove redundancy in the output and/or to return aggregates that apply to the grouped data.

In your CockroachDB SQL, the GROUP BY statement comes after the WHERE or HAVING clause and comes before the ORDER BY statement.

Using Group By

Syntax of Group By

1
2
3
4
5
6
7
8
9
10
11
SELECT
    idNotUnique
    , Some_Aggregate_Func(intMeasurement)
FROM
    [tbl_name]
WHERE/HAVING
    [my_conditions]
GROUP BY
    idNotUnique;
ORDER BY
    [colNameToSortOn];

Analysis of Group By syntax:

In the above SQL, “Some_Aggregate_Func” represents a function that acts upon a group of values, as mentioned in the prior section.

HAVING vs WHERE

HAVING and WHERE perform equivalent but different roles.

  • The HAVING statement is for filtering values from a set. This is what you will typically use when using aggregate functions in your SQL. A GROUP BY is required.
  • WHERE clauses are used to filter records from a result. This filtering happens before any groupings are made. This statement can be used with or without GROUP BY being in the overall SQL script.

Why use GROUP BY?

Any time we need to group data that has common values. We use GROUP BY to either remove redundancy or make use of an aggregate function like SUM, for example. Below, are more examples.

Aggregate functions

We have named the following dataset as a table called tblFunctions:

txtNameFunctiontxtCategorytxtOperatesOnintDifficultyintUses
char_lengthSingularstring432
lengthSingularstring445
lowerSingularstring321
upperSingularstring320
avgAggregatenumber512
countAggregateall514
minAggregatenumber58
maxAggregatenumber58
sumAggregatenumber520

Analysis of the columns and rows in the above table:

  • txtNameFunction is the name of the function.
  • txtCategory tells if this function’s purpose is for singular or aggregates. If “Aggregate,” then we know this is a function that requires a GROUP BY clause be in the SQL script.
  • txtOperatesOn is whether we can feed this function a string, a number, or “all” (both).
  • intDifficulty is the difficulty to learn, understand, and make use of the function.
  • intUses is a percentage value (expressed as an integer) of how often the function is used in code throughout the company’s applications.

Next we will use GROUP BY on the above table and for now, we will be leaving out any functions.

Group By example

1
2
3
SELECT txtCategory
FROM tblFunctions
GROUP BY txtCategory;

The above Cockroach query returns:

txtCategory
Singular
Aggregate

Why are we seeing no numbers in the recordset returned?

  • We instructed the Cockroach database to look at only one column, which is “txtCategory”.
  • We pretty much said “Examine the data in that one column and show only the unique values.

With some repetition, this will all become more clear, so now we’ll do the same with another column in our table:

1
2
3
4
5
6
SELECT
    txtOperatesOn
FROM
    tblFunctions
GROUP BY
    txtOperatesOn;

The above Cockroach query returns:

txtOperatesOn
all
number
string

Notice how duplicate values in the txtOperatesOn column were aggregated?

Now you have seen two examples of a super simple use of GROUP BY. Now we will increase the difficulty just a little by adding in an aggregate function to get some obvious use out of this situation. We’ll write the SQL to use the GROUP BY method along with the added AVG function to see the average difficulty of all the functions in the Cockroach table we named tblFunctions.

1
2
3
4
5
6
7
SELECT
    txtCategory
    , Avg(intDifficulty) AS numAvgDifficulty
FROM
    tblFunctions
GROUP BY
    txtCategory;

Now to analyze what the SQL script we wrote above accomplishes:

  • SELECT: We are relaying to Cockroach’ interpretor that we want to get an average of all the values (because there is no HAVING statement) in the intDifficulty column, while naming the output to be AS numAvgDifficulty. The AVG function requires GROUP BY.
  • FROM: Our Cockroach data source is the tblFunctions table.
  • GROUP BY: Tells Cockroach that we are grouping the txtCategory column.

The above query returns:

txtCategorynumAvgDifficulty
Aggregate4
Singular2.5

Next, we will show another example, but now using the SUM function instead of the AVG function (both aggregate), in order to find the answer to the question, “What is used more often in our company, Singular functions or Aggregate functions?

1
2
3
4
5
6
7
SELECT
    txtCategory
    , SUM(intUses) AS intSumOfUses
FROM
    tblFunctions
GROUP BY
    txtCategory;

Returns:

txtCategoryintSumOfUses
Aggregate62
Singular118

HAVING to filter data

What if we only want to return intSumOfUses based on when the function’s category is “Aggregate”? To do so, we add the “HAVING” clause:

1
2
3
4
5
6
7
8
9
10
11
SELECT
    txtCategory
    , SUM(intUses) AS intSumOfUses
FROM
    tblFunctions

-- 2 NEW LINES OF script:
HAVING
    txtCategory = 'Aggregate'
GROUP BY
    txtCategory;

Returns:

txtCategoryintSumOfUses
Aggregate62

That’s all for now. -Feel free to look for our related documents that use the GROUP BY CockroachDB clause, along with related aggregate functions like AVG, MIN, MAX, COUNT, and SUM.

Conclusion

We saw here how to use the GROUP BY statement in CockroachDB. We explored syntax and realistic examples, so as to increase your knowledge of the many ways GROUP BY can be used when combined with the HAVING clause and the AVG and SUM functions. Code samples were included throughout this document.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.