Delete duplicate records from table in SQL
Redundancy is the most common issue in all databases. Very often, your database is filled up with redundant data and then you are burdened with the task of removing redundant data from your database. One of the most important tasks in this regard is deleting out the duplicate records. Writing a query to delete duplicate records is not an easy task but writing an efficient one is very important especially when your database has huge volumes of records. Let us go through some of the scenarios applicable when deleting records from a database.
Deleting all duplicate records
To delete all duplicate records from a table, check the below sample query. This query will delete all the records having the same combination of firstname and lastname
delete FROM Student WHERE FirstName IN
(SELECT FirstName FROM Student GROUP BY FirstName, LastName HAVING (COUNT(FirstName)>1))
Sample output:
Left table- Records before deletion, Right table-Records after deletion
Keeping one instance of duplicate records
To keep one record (among duplicate records) and delete the remaining duplicates from a table, check the below sample query. This query will keep one record and will delete the remaining record having same firstname. Note that I have not deleted the records by firstname-lastname combination.
DELETE FROM Student WHERE StudentID NOT IN
(SELECT MAX(StudentID) FROM Student GROUP BY FirstName)
Sample output:
Left table- Records before deletion, Right table-Records after deletion
Delete records using table expression
Below query uses table expression to delete duplicate records. Here, I have matched three columns (FirstName, LastName, Age) to determine duplicate data
Note that this query will execute slowly, compared to above two methods because of the use of table expressions and partition by clause. This technique of using table expression would be useful when you want to do some additional activity with the table expression after deleting the records
With TempStudent(dupCount) --Columns of Temporary table
AS
( SELECT row_number() over (partition by FirstName, LastName, Age order by studentid) AS dupCount from Student)
DELETE FROM TempStudent WHERE dupCount > 1
Sample output:
Conclusion:
In this article, we went through some of the most commonly faced scenarios while deleting duplicate records from a table.