Using a Postgres Subquery
Introduction
In this training session, we learn the best ways of using a Postgres subquery. During this lesson, we will use and learn about nested select in Postgres, using a subquery with select, update, and insert into. We’ll also look at how “IN” and “NOT IN” work in a subquery in PostgreSQL. A simple way of describing a PostgreSQL subquery is “A query within a query.”
Prerequisites
- A basic grasp of how to create queries for a PostgreSQL database (or other SQL-compliant databases like Cockroach or MS SQL Server) using one of the many free tools available, such as PG Admin, dBeaver, or with programming languages like Javascript, Python, Java, PHP, C#, ASP.Net, VB.Net, Node, B4X, Classic ASP, etc.) that provide connections to your database, as well as a method for using SQL queries with your database to get data or make changes to your data.
- Sense of how common SQL statements like SELECT, FROM, and WHERE work because we will build on those statements in this training session.
What is a Postgres Inner Select?
An inner select or inner query is otherwise known as a subquery in Postgres.
In a Postgres application we may sometimes want to retrieve data from a subset of data created on the fly, insert data into a table based on a subset of data, or update a table of data based on a data subset. We’ll be exploring all these ways of using a Postgres subquery (inner select).
Postgres subquery syntax
Subquery in WHERE syntax
In this syntax example, we will show the most simple form of a Postgres subquery, placing the inner select in the WHERE clause.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT column_field_01 , column_field_02 , column_field_03 FROM tbl_data WHERE column_field_01 IN -- this can also be "NOT IN", "EXISTS, an operator like "=", "<", and others. ( SELECT column_field_01 FROM tbl_data WHERE [condition] ) ORDER BY column_field_01 |
Analysis
In the code example above, we show two sets of data that are pulled from the same table; “tbl_data”. The simple view here is that the “inner” nested data set – “inner query” – the one in the WHERE clause, runs first, and then the outer query is filtered based on the results of the data returned by the nested query. The IN statement within that WHERE clause is important. It is the command that tells PostgreSQL’s SQL engine to pull only rows from the table where the value in column_field_01 exists in the rows returned by the inner select SQL.
Subquery in FROM syntax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SELECT column_field_01 , column_field_02 , column_field_03 , nested.column_field_04 FROM ( SELECT column_field_04 FROM tbl_data WHERE [condition] ) AS inner_query |
Analysis
Building on what you learned above, the same principle applies here. In this example, we gave a name to the inner query, “inner_query”, so that we can refer to it from the outer query at the top of the full SQL. Note that we have enclosed the inner select query in parenthesis. This is a must.
Prior to creating a real world example for study, let’s look at the syntax for using a subquery with INSERT INTO.
Subquery in INSERT syntax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | INSERT INTO tbl_data ( column_field_01 , column_field_02 ) VALUES ( SELECT column_field_03 , column_field_04 FROM tbl_other WHERE [condition] ) |
Analysis
In the INSERT INTO type query above, we started by telling Postgres to add some rows to “tbl_data”. Next, we are deciding which columns (column_field_01 and column_field_02) that we wish to fill with the two VALUES returned by the subquery that follows and is enclosed in parentheses. Finally, it is important to notice that the inner select pulls its data from “tbl_other”, utilizing a potential condition in the WHERE clause at the end of the subquery. Notice here one way we departed from the previous examples is that we are using two tables. Not a whole lot more complex, but important to note.
Why use a Postgres Subquery
Here’s a problem that is fairly typical in one form or another where subqueries can come in handy. If a database developer didn’t know subqueries existed, they might believe they need to go through the convoluted process of retrieving data from a table into a list or array, manage the array in some way such as adding items, removing items, and/or sorting, and then use that array to add specific rows back into a table. Luckily, a subquery (nested select) provides a far more efficient method for doing that. We’ll start by scripting the SQL, which will be structurally similar to Subquery in INSERT syntax above.
Insert data using subquery in Postgres
First step is to set up two tables with test data:
tbl_tech_used
t_tech_name | t_tech_cat | i_rating |
---|---|---|
Javascript | Language | 94 |
PostgreSQL | SQL database | 91 |
Python | Language | 93 |
MS SQL Server | SQL database | 92 |
C# | Language | 92 |
The above table is the table used by the company to track which technologies are being used by the company. The table below is filled with potential technologies. It’s important to notice the bottom table includes tech that is already in the top table. So when we are building our INSERT INTO query using a NESTED SELECT, we want to be sure to keep duplicates out. This is a use case where nested selects excel.
tbl_tech_proposed
t_tech_name | t_tech_cat | i_rating |
---|---|---|
Javascript | Language | 94 |
Mongo | NoSQL database | 85 |
MySQL | SQL database | 50 |
PostgreSQL | SQL database | 91 |
Python | Language | 93 |
PHP | Language | 71 |
Java | Language | 74 |
C++ | Language | 72 |
C# | Language | 93 |
Writing the PostgreSQL SQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | INSERT INTO tbl_tech_used ( t_tech_name , t_tech_cat , i_rating ) VALUES ( SELECT t_tech_name , t_tech_cat , i_rating FROM tbl_tech_proposed WHERE i_rating > 75 AND t_tech_name NOT IN ( SELECT t_tech_name FROM tbl_tech_used ) ) |
Analysis
The first thing you may notice with our example above is that we have a nested SELECT in another nested SELECT! Why did we do this? Because we want to be sure we do not add a row to tbl_tech_used that is already in that table. Let’s take it line by line:
- INSERT INTO: This is where we set up which columns in “tbl_tech_used” get populated with data from the first (outer) of our SELECT statements below.
- VALUES: This is part of the INSERT functionality.
- SELECT (OUTER): Determines which fields (columns) are pulled from “tbl_tech_proposed”.
- WHERE: This line filters results of our subquery by two requirements. (1) “i_rating” must be larger than 75. (2) “t_tech_name” must not already exist in “tbl_tech_used”.
- NOT IN … SELECT (INNER): This is the part of our query that makes sure we don’t copy duplicate records from “tbl_tech_proposed” into “tbl_tech_used”.
After we run that query, we get the following resulting dataset:
tbl_tech_used
t_tech_name | t_tech_cat | i_rating |
---|---|---|
Javascript | Language | 94 |
PostgreSQL | SQL database | 91 |
Python | Language | 93 |
MS SQL Server | SQL database | 92 |
C# | Language | 93 |
Miscellaneous
- Are you curious why we used a prefix of “i” or “t” in some columns? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a Naming conventions on that topic.
Conclusion
We learned in this training tutorial some common / popular ways of using a Postgres subquery. During this lesson, we used and learned about nested select in Postgres, using subqueries with select, update, and insert into. We also looked at how “IN” and “NOT IN” work in a subquery in PostgreSQL. Finally, we saw evidence that a simple way of describing a PostgreSQL subquery is “A query within a query.” Code samples were included.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started