PostgreSQL Union vs Union All

Introduction

In this article we will discuss the UNION and UNION ALL operators in PostgreSQL database. These two operators are both used combine results from different tables. It’s rare that you’ll develop an application without ever having to create queries that combine result sets from multiple tables. Unions come from set theory. In set theory there is an operation called Union that combines two different or same sets and gives a unique third set as a solution. How union works in set is as follows:

Consider the set A and B as in figure below the union of these two sets will be all the values of set A and all the values of set B except for duplicates i.e. those same values that are in both A and B will be included only once.

Image from Gyazo

So, let’s consider the Example of Set A and B to be: A = {1,2,3,4} B = {2,4,5,6} So, A U B will be {1,2,3,4,5,6}.

This was the Union operator in Set theory. Now let’s see how PostgreSQL provides that functionality.

Union in PostgreSQL

In PostgreSQL this facility to merge two different or same tables similar to the two sets A and B is the UNION operator. In PostgreSQL UNION operator is used to combine two SELECT statements. The syntax is as follows:

SELECT [column1…columnN]
FROM table_1
[WHERE conditions]
UNION
SELECT [column1…columnN]
FROM table_2
[WHERE conditions]

Let’s use UNION in an example. Consider Two tables named as SalesA and SalesB respectively. The data in Both tables is shown below. Both tables have same columns but different data rows:

salesa:

  name  | numsales
--------+----------
 Henry  |       50
 Edward |       60
 Sheila |      100
(3 rows)

salesb:

  name  | numsales
--------+----------
 Henry  |       50
 Edward |       60
 Sheila |       99
(3 rows)

Now the SQL statement for UNION of these two tables will be:

SELECT * FROM salesa
UNION
SELECT * FROM salesb

The result will be a new table with merged data rows from both tables

  name  | numsales
--------+----------
 Henry  |       50
 Edward |       60
 Sheila |       99
 Sheila |      100
(4 rows)

There are four rows in the combined result set because the UNION operator removes two duplicate rows. As you may have noticed in the above example the columns in both tables were the same. PostgreSQL has some special rules for UNION operator to work: 1. Each SELECT statement in the UNION query should have same number of columns. 2. The columns must also have similar data types. 3. The columns in each SELECT statement must also be in same order.

Let’s see how these rules effect the UNION statement.

First Rule: Let’s examine the first rule and see what happens when we violate it with a union like this:

SELECT name, numsales from salesa
UNION
SELECT name from salesb;

Result:

ERROR:  each UNION query must have the same number of columns
LINE 3: SELECT name from salesb;

This time we selected name and amount from salesa and only name from salesb and because this violates rule 1 so PostgreSQL gives and error stating “Each UNION query must have same number of columns”.

Second Rule: To check the second rules let’s create a new table salesc with numsales as a text type.

CREATE TABLE salesc (name VARCHAR (50), numsales VARCHAR (50));

Now if we take the union of salesa and saless it will give an error because salesa amount column is in integer and salesc amount column is in Text.

SELECT numsales from salesa
UNION
SELECT numsales from salesc;

Returns:

ERROR:  UNION types integer and character varying cannot be matched
LINE 3: SELECT numsales from salesc

Third Rule: This rule is simple. In order to visualize lets again take example of salesa and salesb but this time we will change the order of the columns in the select statements. We will take numsales first and name as second column in salesa and vice versa in salesb.

SELECT numsales, name from salesa
UNION
SELECT name, numsales from salesb;

Returns:

ERROR:  UNION types integer and character varying cannot be matched
LINE 3: SELECT name, numsales from salesb;

It gave the same error as in second rules because the query tried to match the name column with account column which was wrong.

Union All in PostgreSQL

In previous we saw how the UNION operator works in PostgreSQL. There is another operator in PostgreSQL that works similar to UNION operator and that is UNION ALL. UNION ALL works just like UNION with the same conditions that UNION had i.e. the three previously defined rules the only difference is that UNION removes duplicate rows but UNION ALL does not remove duplicate rows. UNION ALL just merges all the rows that satisfy all the conditions. In order to understand it lets consider the same sales tables example. This time we will use UNION ALL operator to see the difference.

We start with these two tables.

Table salesa:

  name  | numsales
--------+----------
 Henry  |       50
 Edward |       60
 Sheila |      100
(3 rows)

Table salesb:

  name  | numsales
--------+----------
 Henry  |       50
 Edward |       60
 Sheila |       99
(3 rows)

Notice that there are two rows that are same in both tables i.e. (Henry, 50) and (Edward, 60).

Now if we use UNION ALL on this table:

SELECT * FROM salesa
UNION ALL
SELECT * FROM salesb;

The result set is:

  name  | numsales
--------+----------
 Henry  |       50
 Edward |       60
 Sheila |      100
 Henry  |       50
 Edward |       60
 Sheila |       99
(6 rows)

Notice the result set included all the rows even the duplicate ones.

All other things about UNION ALL and UNION are the same but here comes the question why UNION ALL exists when there is UNION that does all things. So, the answer to that question is performance efficiency.

Performance is another difference between UNION ALL and UNION. UNION ALL is faster than UNION as it does not remove the duplicates so when there is performance constraint, we will use UNION ALL because in performance issues we usually want to just analyze the data set. Let’s see an example as follows.

Performance of UNION:

SELECT 'foo' AS bar UNION SELECT 'foo' AS bar

Result:

+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.01 sec)

Performance of UNION ALL:

SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar

Result:

+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)

Order

Note that on UNION and UNION ALL merge the rows in a non-specific order so in order to avoid such scenario, use the PostgreSQL ORDER BY command.

Conclusion

Both UNION and UNION ALL are used for merging the rows of two SELECT statements. The difference is that UNION removes duplicates whereas UNION ALL does not which results in slower performance.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.