How to Find Duplicate Values in a SQL Table?

How to Find Duplicate Values in a SQL Table?

Written by Bobby Iliev on Nov 19th, 2021 Views Report Post

Introduction

In this quick tutorial, you will learn how to find the duplicate values in a SQL table with a single SQL query.

Let's say that we have a table called users with the following data:

SELECT username,email FROM users;
+----------+---------------------+
| username | email               |
+----------+---------------------+
| bobby    | [email protected]   |
| devdojo  | [email protected] |
| tony     | [email protected]    |
| greisi   | [email protected]  |
| bobby    | [email protected]   |
| tony     | tony@devdojo        |
| tony     | tony@devdojo        |
| greisi   | greisi@devdojo      |
+----------+---------------------+

As you can see, there are multiple users with the same username and email. The result that we want to get is a list of all duplicate records ordered by the count:

| tony     | [email protected]   |        3 |
| bobby    | [email protected]  |        2 |
| greisi   | [email protected] |        2 |

Let's see how to do that!

Finding Duplicate Values in a SQL Table

In order to get the duplicate values, you can group by the two specific columns. In the example above those would be the username and the email columsn.

SELECT
    username, email, COUNT(*)
FROM
    users
GROUP BY
    username, email
HAVING 
    COUNT(*) > 1
ORDER BY count(*)
DESC;

The result of the above query would be:

 username |       email        | count 
----------+--------------------+-------
 tony     | [email protected]   |     3
 bobby    | [email protected]  |     2
 greisi   | [email protected] |     2

MySQL incompatible with sql_mode=only_full_group_by

If you are using MySQL you would get the following warning:

ERROR 1140 (42000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'demo_db.users.username'; this is incompatible with sql_mode=only_full_group_by

To work around this you can change the sql_mode for the current session:

SET sql_mode = '' 

Or if you want to permanently change this, you need to update the MySQL configuration file and set sql_mode under the [mysqld] section to:

[mysqld]  
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION";

After that you would have to restart the MySQL service.

Once this is done, the above query should work as expected and the output would be:

+----------+--------------------+----------+
| username | email              | COUNT(*) |
+----------+--------------------+----------+
| tony     | [email protected]   |        3 |
| bobby    | [email protected]  |        2 |
| greisi   | [email protected] |        2 |
+----------+--------------------+----------+

Conclusion

This is pretty much it! Now you know how to find duplicate values in a SQL table!

In case that you are just getting started with SQL, I would suggest making sure to check out this free eBook here:

💡 Introduction to SQL eBook

In case that you are already using SQL on daily basis, and are looking for a way to drastically reduce the latency of your data analytics, make sure to out Materialize!

Materialize - a streaming database

Materialize is a Streaming Database for Real-time Analytics. It is a reactive database that delivers incremental view updates and it helps developers easily build with streaming data using standard SQL.

Comments (0)