Selecting all unique rows from duplicate rows in table in SQL Server

Hi,

This post will show how you can write a select query to select only unique rows from a table that has many duplicate rows. In other words, this post will show the query to select distinct rows from a table that has duplicate rows.

Let’s say we have a table duplicated_table that has the following rows.

Initial Table
ID Name Email Age
1 Amogh amogh@example.com 24
1 Amogh amogh@example.com 24
2 Uday uday@example.com 23
2 Uday uday@example.com 23
3 Pati pati@example.com 24
3 Pati pati@example.com 24
3 Pati pati@example.com 24
and so on

Now, we can see that all the rows are repeated at least once and we need the output as shown below:

Expected Output
ID Name Email Age
1 Amogh amogh@example.com 24
2 Uday uday@example.com 23
3 Pati pati@example.com 24
and so on — Only unique rows will appear here — no duplicates

First thing to observe here is that there is no unique column in the table. So we need something to uniquely identify the column in the duplicated table. For this, we can use the ROW_NUMBER() function of SQL server. ROW_NUMBER() returns a unique row number for the current row.

So now, the logic that we can use for our purpose is:

  1. Create a data source that will select all the required data that is grouped together by a column, along with a row number to each row
  2. From this dynamically created data source, select those columns that are required but with a row number of “1”

It is important to understand point number 2 above. The output of point number 1 will be as shown below:

Table with Row Numbers
Row Number ID Name Email Age
1 1 Amogh amogh@example.com 24
2 1 Amogh amogh@example.com 24
1 2 Uday uday@example.com 23
2 2 Uday uday@example.com 23
1 3 Pati pati@example.com 24
2 3 Pati pati@example.com 24
3 3 Pati pati@example.com 24
and so on

The above table will be the output of the first point mentioned above. That is, create a data source that selects all the data along with the row number. Observe the “Row Number” column in the above table. Though the table has duplicate rows, each duplicate row has a unique row number.

Now, the only thing left to do is to select those rows from the above table that have row number = 1. And you will end up with the desired output.

Now let’s see the queries for both the steps.

For step – 1: Create a data source that selects all the data along with row number.

;with dataSrc as 
( 
     select ROW_NUMBER() over(partition by Name order by Name) as RowNum,
     ID, Name, email, age from duplicated_table 
)

For step – 2: Selecting those rows with row number = 1;

select Id, Name, email, age from dataSrc where RowNum = 1;

Final Complete Query :

;with dataSrc as 
( 
    select ROW_NUMBER() over(partition by Name order by Name) as RowNum,
    ID, Name, email, age from duplicated_table 
) 
select Id, Name, email, age from dataSrc where RowNum = 1;

Hope this helps!

Advertisements

About Amogh Natu

Technology enthusiast, Associate Consultant @ Microsoft, music lover, love my guitar, Microsoft.NET, ASP.NET, C#.NET Professional.
This entry was posted in SQL Server and tagged , , , , , , , , , , , , . Bookmark the permalink.

Your comments or opinions.....

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s