How to Use Postgres to Remove Duplicates from a Table

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

When you’re using PostgreSQL to manage data, there’s a chance you’ll end up with duplicate data in your tables. This can cause problems in your applications and lead to inaccuracies in your results. Fortunately, it’s easy to resolve this problem and clean up your tables with the help of the DELETE statement. In this article, we’ll provide some examples to show you how to use Postgres to remove duplicates from a table.

Prerequisites

Before we proceed, let’s go over a few key prerequisites that need in be in place for this tutorial:

  • Make sure that PostgreSQL is installed and running on your computer. You’ll need it to test out our examples showing how to use Postgres to remove duplicates from a table.
  • You should have some basic knowledge of PostgreSQL to get the most out of the instructions and examples presented in this tutorial.

PostgreSQL Examples

Let’s begin by examining the standard syntax of the DELETE statement:

1
DELETE FROM TABLE_NAME WHERE condition

It’s fairly straightforward: You specify the table from which you want to delete records right after the DELETE FROM statement, and you define the conditions these records must meet in the WHERE clause.

Create a table in PostgreSQL

Let’s create a sample table to use in our DELETE examples. We’ll name it pets:

1
2
3
4
CREATE TABLE pets(
id SERIAL PRIMARY KEY,
pet_type VARCHAR(50)
);

After we create the table, let’s insert records into it:

1
2
3
4
5
6
7
8
INSERT INTO pets(pet_type) VALUES('dog');
INSERT INTO pets(pet_type) VALUES('dog');
INSERT INTO pets(pet_type) VALUES('dog');
INSERT INTO pets(pet_type) VALUES('cat');
INSERT INTO pets(pet_type) VALUES('cat');
INSERT INTO pets(pet_type) VALUES('cat');
INSERT INTO pets(pet_type) VALUES('hamster');
INSERT INTO pets(pet_type) VALUES('hamster');

If we SELECT all data from the table now, it should return the following:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT * FROM pets;
 id | pet_type
----+----------
  1 | dog
  2 | dog
  3 | dog
  4 | cat
  5 | cat
  6 | cat
  7 | hamster
  8 | hamster
(8 ROWS)

Count duplicates

We can use the COUNT function to identify how many duplicates are in a table. We’ll count the duplicates in the pets table using the statement shown below:

1
SELECT pet_type, COUNT( pet_type ) FROM pets GROUP BY pet_type;
1
2
3
4
5
6
 pet_type | count
----------+-------
 dog      |     3
 cat      |     3
 hamster  |     2
(3 rows)

It’s easy to see which pet types have duplicates.

Remove duplicates from the table

Let’s assume that we want to clean up the pets table and eliminate duplicates. The following DELETE statement will remove the duplicates using a subquery:

1
2
3
DELETE FROM pets
WHERE id IN(SELECT id FROM(SELECT  id,ROW_NUMBER() OVER(PARTITION BY pet_type ORDER BY id)
AS row_num FROM pets) t WHERE t.row_num > 1);

This query uses the ROW_NUMBER() OVER and PARTITION BY clauses in tandem to group results of the same pet_type. It assigns a row number row_num to each row of a group, beginning with a value of ‘1’. When it encounters a group of the next pet_type, it starts the numbering over again with a row_num of ‘1’. You can see in the statement that we delete anything with a row_num greater than one, leaving just one row behind for each unique pet_type. This eliminates all duplicates from the table.

To check whether our delete operation worked, let’s display the contents of the table:

1
2
3
4
5
6
SELECT * FROM pets;
 id | pet_type
----+----------
  1 | dog
  4 | cat
  7 | hamster

For our next example, we’ll start by inserting a duplicate value again:

1
2
INSERT INTO pets(pet_type) VALUES('cat');
INSERT INTO pets(pet_type) VALUES('hamster');

We’ll display the contents of the table to confirm that the duplicate values are there:

1
2
3
4
5
6
7
8
9
SELECT * FROM pets;
 id | pet_type
----+----------
  1 | dog
  4 | cat
  7 | hamster
  8 | hamster
  9 | cat
(5 ROWS)

This time, we will only delete the duplicates of the cat pet_type. To accomplish this, we’ll use a different method of removing duplicates from the table:

1
2
DELETE FROM pets a USING pets b
WHERE a.id >b.id AND a.pet_type = b.pet_type AND a.pet_type = 'cat';

Let’s check our table contents again:

1
2
3
4
5
6
7
8
SELECT * FROM pets;
 id | pet_type
----+----------
  1 | dog
  4 | cat
  7 | hamster
  8 | hamster
(4 ROWS)

The duplicate value has been removed.

Conclusion

Duplicate values in a PostgreSQL table can sometimes cause problems, and you may want to remove them. In this article, we showed you how to use Postgres to remove duplicates from a table with the DELETE statement. Our examples showed how to remove all duplicates or just a particular subset of duplicates. If you’ve followed along with this tutorial, you’ll be prepared to clean up the tables in your own PostgreSQL installation.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.