Introduction
In this quick tutorial, you will learn how to find the duplicate values in a SQL table with a single SQL query.
Let's say that we have a table called users
with the following data:
SELECT username,email FROM users;
+----------+---------------------+
| username | email |
+----------+---------------------+
| bobby | [email protected] |
| devdojo | [email protected] |
| tony | [email protected] |
| greisi | [email protected] |
| bobby | [email protected] |
| tony | tony@devdojo |
| tony | tony@devdojo |
| greisi | greisi@devdojo |
+----------+---------------------+
As you can see, there are multiple users with the same username and email. The result that we want to get is a list of all duplicate records ordered by the count:
| tony | [email protected] | 3 |
| bobby | [email protected] | 2 |
| greisi | [email protected] | 2 |
Let's see how to do that!
Finding Duplicate Values in a SQL Table
In order to get the duplicate values, you can group by the two specific columns. In the example above those would be the username
and the email
columsn.
SELECT
username, email, COUNT(*)
FROM
users
GROUP BY
username, email
HAVING
COUNT(*) > 1
ORDER BY count(*)
DESC;
The result of the above query would be:
username | email | count
----------+--------------------+-------
tony | [email protected] | 3
bobby | [email protected] | 2
greisi | [email protected] | 2
MySQL incompatible with sql_mode=only_full_group_by
If you are using MySQL you would get the following warning:
ERROR 1140 (42000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'demo_db.users.username'; this is incompatible with sql_mode=only_full_group_by
To work around this you can change the sql_mode
for the current session:
SET sql_mode = ''
Or if you want to permanently change this, you need to update the MySQL configuration file and set sql_mode
under the [mysqld]
section to:
[mysqld]
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION";
After that you would have to restart the MySQL service.
Once this is done, the above query should work as expected and the output would be:
+----------+--------------------+----------+
| username | email | COUNT(*) |
+----------+--------------------+----------+
| tony | [email protected] | 3 |
| bobby | [email protected] | 2 |
| greisi | [email protected] | 2 |
+----------+--------------------+----------+
Conclusion
This is pretty much it! Now you know how to find duplicate values in a SQL table!
In case that you are just getting started with SQL, I would suggest making sure to check out this free eBook here:
In case that you are already using SQL on daily basis, and are looking for a way to drastically reduce the latency of your data analytics, make sure to out Materialize!
Materialize is a Streaming Database for Real-time Analytics. It is a reactive database that delivers incremental view updates and it helps developers easily build with streaming data using standard SQL.
Comments (0)