DELETE all duplicates from TABLE

By | 29 October, 2013

If we have a long table with thousands of records with some duplicate contents and we want to trim it down to the bare essentials (no duplicates), here’s one (of several) way of doing it:

For this, assume we have:

– a table called “emails”, with fields: id_email, email
– the table has several records with duplicate emails in the “email” field

So we do it like this:

DELETE FROM emails 
WHERE id_email NOT IN (SELECT MIN(id_email) FROM emails GROUP BY email)

Let’s break it down:

SELECT MIN(id_email) FROM emails GROUP BY email

Gets all records from table “emails”, groups it by “email” field – this removes duplicates from the list – and the MIN(id_mail) selects the lowest value in the field “id_email” of the found duplicates. For example:

imagine you have two records like this:

id_email email
23 blah@blah.com
24 blah@blah.com

the SELECT above would group both records (they have equal “email” fields), and list the one with the lower “id_email” (the MIN(id_email) does that), which is 23, so we would get a list with record 23, got it?

The rest:

DELETE FROM emails WHERE id_email NOT IN ... 

Deletes all records whose “id_email” are NOT in the list we explained above.

So, to take the above example, the record with id_email=24 wouldn’t be on the list of the SELECT, right? Remember it discarded the id_email=24 for the id_email=23 because of the MIN()? Well, so the record with id_email=24 would be deleted, as would all others whose id_email wasnt on the list.

There are obviously other ways of achieving the same result.