Group By in CockroachDB
Introduction
In this lesson document, we will investigate use of Group By in CockroachDB. During this lesson, we will use the following structure:
- What does GROUP BY in SQL look like and do in Cockroach databases?
- How do we use GROUP BY in our CockroachDB SQL scripts?
- When in Cockroach do we use the GROUP BY statement? We’ll answer this by looking at a realistic use case. Includes use of Cockroach’s HAVING SQL clause, which is almost identical to the WHERE SQL clause. In addition to that, we will include some examples where you may notice we make use of the AVG and SUM aggregate functions.
Prerequisites
- A novice or beginner level of knowledge of building SQL queries to add, remove, or view data from a Cockroach database (or similar databases like MS Access, SQL Server, Postgres, Oracle, etc.) using the DBeaver tool, another relational db admin tool, and/or with script or languages like Python, Java, Node, VB.Net, C#, PHP, ASP.Net, B4a, B4i, etc. that provide us with a database connection, as well as a method for executing SQL on our database tables.
- Comprehension of common SQL statements, including SELECT and FROM, as well as clauses like WHERE and HAVING, and operators that include LIKE. If you have no exposure up to now with the HAVING clause, it’s okay, as we will be learning about it here.
What is GROUP BY?
The Cockroach GROUP BY command creates a grouping of a set of values so that you can use group-level (aggregate) functions like Avg, Bit_And, Bit_Or, Count, Max, Min, and Sum upon the data.
The GROUP BY statement in CockroachDB is used along with SELECT to group rows in a table that have the same data. The purpose here is to remove redundancy in the output and/or to return aggregates that apply to the grouped data.
In your CockroachDB SQL, the GROUP BY statement comes after the WHERE or HAVING clause and comes before the ORDER BY statement.
Using Group By
Syntax of Group By
1 2 3 4 5 6 7 8 9 10 11 | SELECT idNotUnique , Some_Aggregate_Func(intMeasurement) FROM [tbl_name] WHERE/HAVING [my_conditions] GROUP BY idNotUnique; ORDER BY [colNameToSortOn]; |
Analysis of Group By syntax:
In the above SQL, “Some_Aggregate_Func” represents a function that acts upon a group of values, as mentioned in the prior section.
HAVING vs WHERE
HAVING and WHERE perform equivalent but different roles.
- The HAVING statement is for filtering values from a set. This is what you will typically use when using aggregate functions in your SQL. A GROUP BY is required.
- WHERE clauses are used to filter records from a result. This filtering happens before any groupings are made. This statement can be used with or without GROUP BY being in the overall SQL script.
Why use GROUP BY?
Any time we need to group data that has common values. We use GROUP BY to either remove redundancy or make use of an aggregate function like SUM, for example. Below, are more examples.
Aggregate functions
We have named the following dataset as a table called tblFunctions:
txtNameFunction | txtCategory | txtOperatesOn | intDifficulty | intUses |
---|---|---|---|---|
char_length | Singular | string | 4 | 32 |
length | Singular | string | 4 | 45 |
lower | Singular | string | 3 | 21 |
upper | Singular | string | 3 | 20 |
avg | Aggregate | number | 5 | 12 |
count | Aggregate | all | 5 | 14 |
min | Aggregate | number | 5 | 8 |
max | Aggregate | number | 5 | 8 |
sum | Aggregate | number | 5 | 20 |
Analysis of the columns and rows in the above table:
- txtNameFunction is the name of the function.
- txtCategory tells if this function’s purpose is for singular or aggregates. If “Aggregate,” then we know this is a function that requires a GROUP BY clause be in the SQL script.
- txtOperatesOn is whether we can feed this function a string, a number, or “all” (both).
- intDifficulty is the difficulty to learn, understand, and make use of the function.
- intUses is a percentage value (expressed as an integer) of how often the function is used in code throughout the company’s applications.
Next we will use GROUP BY on the above table and for now, we will be leaving out any functions.
Group By example
1 2 3 | SELECT txtCategory FROM tblFunctions GROUP BY txtCategory; |
The above Cockroach query returns:
txtCategory |
---|
Singular |
Aggregate |
Why are we seeing no numbers in the recordset returned?
- We instructed the Cockroach database to look at only one column, which is “txtCategory”.
- We pretty much said “Examine the data in that one column and show only the unique values.
With some repetition, this will all become more clear, so now we’ll do the same with another column in our table:
1 2 3 4 5 6 | SELECT txtOperatesOn FROM tblFunctions GROUP BY txtOperatesOn; |
The above Cockroach query returns:
txtOperatesOn |
---|
all |
number |
string |
Notice how duplicate values in the txtOperatesOn column were aggregated?
Now you have seen two examples of a super simple use of GROUP BY. Now we will increase the difficulty just a little by adding in an aggregate function to get some obvious use out of this situation. We’ll write the SQL to use the GROUP BY method along with the added AVG function to see the average difficulty of all the functions in the Cockroach table we named tblFunctions.
1 2 3 4 5 6 7 | SELECT txtCategory , Avg(intDifficulty) AS numAvgDifficulty FROM tblFunctions GROUP BY txtCategory; |
Now to analyze what the SQL script we wrote above accomplishes:
- SELECT: We are relaying to Cockroach’ interpretor that we want to get an average of all the values (because there is no HAVING statement) in the intDifficulty column, while naming the output to be AS numAvgDifficulty. The AVG function requires GROUP BY.
- FROM: Our Cockroach data source is the tblFunctions table.
- GROUP BY: Tells Cockroach that we are grouping the txtCategory column.
The above query returns:
txtCategory | numAvgDifficulty |
---|---|
Aggregate | 4 |
Singular | 2.5 |
Next, we will show another example, but now using the SUM function instead of the AVG function (both aggregate), in order to find the answer to the question, “What is used more often in our company, Singular functions or Aggregate functions?
1 2 3 4 5 6 7 | SELECT txtCategory , SUM(intUses) AS intSumOfUses FROM tblFunctions GROUP BY txtCategory; |
Returns:
txtCategory | intSumOfUses |
---|---|
Aggregate | 62 |
Singular | 118 |
HAVING to filter data
What if we only want to return intSumOfUses based on when the function’s category is “Aggregate”? To do so, we add the “HAVING” clause:
1 2 3 4 5 6 7 8 9 10 11 | SELECT txtCategory , SUM(intUses) AS intSumOfUses FROM tblFunctions -- 2 NEW LINES OF script: HAVING txtCategory = 'Aggregate' GROUP BY txtCategory; |
Returns:
txtCategory | intSumOfUses |
---|---|
Aggregate | 62 |
That’s all for now. -Feel free to look for our related documents that use the GROUP BY CockroachDB clause, along with related aggregate functions like AVG, MIN, MAX, COUNT, and SUM.
Conclusion
We saw here how to use the GROUP BY statement in CockroachDB. We explored syntax and realistic examples, so as to increase your knowledge of the many ways GROUP BY can be used when combined with the HAVING clause and the AVG and SUM functions. Code samples were included throughout this document.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started