PostgreSQL Union vs Union All
Introduction
In this article we will discuss the UNION and UNION ALL operators in PostgreSQL database. These two operators are both used combine results from different tables. It’s rare that you’ll develop an application without ever having to create queries that combine result sets from multiple tables. Unions come from set theory. In set theory there is an operation called Union that combines two different or same sets and gives a unique third set as a solution. How union works in set is as follows:
Consider the set A and B as in figure below the union of these two sets will be all the values of set A and all the values of set B except for duplicates i.e. those same values that are in both A and B will be included only once.
So, let’s consider the Example of Set A and B to be: A = {1,2,3,4} B = {2,4,5,6} So, A U B will be {1,2,3,4,5,6}.
This was the Union operator in Set theory. Now let’s see how PostgreSQL provides that functionality.
Union in PostgreSQL
In PostgreSQL this facility to merge two different or same tables similar to the two sets A and B is the UNION operator. In PostgreSQL UNION operator is used to combine two SELECT statements. The syntax is as follows:
1 2 3 4 5 6 7 | SELECT [column1…columnN] FROM table_1 [WHERE conditions] UNION SELECT [column1…columnN] FROM table_2 [WHERE conditions] |
Let’s use UNION in an example. Consider Two tables named as SalesA and SalesB respectively. The data in Both tables is shown below. Both tables have same columns but different data rows:
salesa:
1 2 3 4 5 6 | name | numsales --------+---------- Henry | 50 Edward | 60 Sheila | 100 (3 rows) |
salesb:
1 2 3 4 5 6 | name | numsales --------+---------- Henry | 50 Edward | 60 Sheila | 99 (3 rows) |
Now the SQL statement for UNION of these two tables will be:
1 2 3 | SELECT * FROM salesa UNION SELECT * FROM salesb |
The result will be a new table with merged data rows from both tables
1 2 3 4 5 6 7 | name | numsales --------+---------- Henry | 50 Edward | 60 Sheila | 99 Sheila | 100 (4 rows) |
There are four rows in the combined result set because the UNION operator removes two duplicate rows. As you may have noticed in the above example the columns in both tables were the same. PostgreSQL has some special rules for UNION operator to work: 1. Each SELECT statement in the UNION query should have same number of columns. 2. The columns must also have similar data types. 3. The columns in each SELECT statement must also be in same order.
Let’s see how these rules effect the UNION statement.
First Rule: Let’s examine the first rule and see what happens when we violate it with a union like this:
1 2 3 | SELECT name, numsales from salesa UNION SELECT name from salesb; |
Result:
1 2 | ERROR: each UNION query must have the same number of columns LINE 3: SELECT name from salesb; |
This time we selected name and amount from salesa and only name from salesb and because this violates rule 1 so PostgreSQL gives and error stating “Each UNION query must have same number of columns”.
Second Rule: To check the second rules let’s create a new table salesc with numsales as a text type.
1 | CREATE TABLE salesc (name VARCHAR (50), numsales VARCHAR (50)); |
Now if we take the union of salesa and saless it will give an error because salesa amount column is in integer and salesc amount column is in Text.
1 2 3 | SELECT numsales from salesa UNION SELECT numsales from salesc; |
Returns:
1 2 | ERROR: UNION types integer and character varying cannot be matched LINE 3: SELECT numsales from salesc |
Third Rule: This rule is simple. In order to visualize lets again take example of salesa and salesb but this time we will change the order of the columns in the select statements. We will take numsales first and name as second column in salesa and vice versa in salesb.
1 2 3 | SELECT numsales, name from salesa UNION SELECT name, numsales from salesb; |
Returns:
1 2 | ERROR: UNION types integer and character varying cannot be matched LINE 3: SELECT name, numsales from salesb; |
It gave the same error as in second rules because the query tried to match the name column with account column which was wrong.
Union All in PostgreSQL
In previous we saw how the UNION operator works in PostgreSQL. There is another operator in PostgreSQL that works similar to UNION operator and that is UNION ALL. UNION ALL works just like UNION with the same conditions that UNION had i.e. the three previously defined rules the only difference is that UNION removes duplicate rows but UNION ALL does not remove duplicate rows. UNION ALL just merges all the rows that satisfy all the conditions. In order to understand it lets consider the same sales tables example. This time we will use UNION ALL operator to see the difference.
We start with these two tables.
Table salesa:
1 2 3 4 5 6 | name | numsales --------+---------- Henry | 50 Edward | 60 Sheila | 100 (3 rows) |
Table salesb:
1 2 3 4 5 6 | name | numsales --------+---------- Henry | 50 Edward | 60 Sheila | 99 (3 rows) |
Notice that there are two rows that are same in both tables i.e. (Henry, 50) and (Edward, 60).
Now if we use UNION ALL on this table:
1 2 3 | SELECT * FROM salesa UNION ALL SELECT * FROM salesb; |
The result set is:
1 2 3 4 5 6 7 8 9 | name | numsales --------+---------- Henry | 50 Edward | 60 Sheila | 100 Henry | 50 Edward | 60 Sheila | 99 (6 rows) |
Notice the result set included all the rows even the duplicate ones.
All other things about UNION ALL and UNION are the same but here comes the question why UNION ALL exists when there is UNION that does all things. So, the answer to that question is performance efficiency.
Performance is another difference between UNION ALL and UNION. UNION ALL is faster than UNION as it does not remove the duplicates so when there is performance constraint, we will use UNION ALL because in performance issues we usually want to just analyze the data set. Let’s see an example as follows.
Performance of UNION:
1 | SELECT 'foo' AS bar UNION SELECT 'foo' AS bar |
Result:
1 2 3 4 5 6 | +-----+ | bar | +-----+ | foo | +-----+ 1 row in set (0.01 sec) |
Performance of UNION ALL:
1 | SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar |
Result:
1 2 3 4 5 6 7 | +-----+ | bar | +-----+ | foo | | foo | +-----+ 2 rows in set (0.00 sec) |
Order
Note that on UNION and UNION ALL merge the rows in a non-specific order so in order to avoid such scenario, use the PostgreSQL ORDER BY command.
Conclusion
Both UNION and UNION ALL are used for merging the rows of two SELECT statements. The difference is that UNION removes duplicates whereas UNION ALL does not which results in slower performance.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started