Separating desired from undesired duplicate records

**mindscape** · 11-20-2007, 09:46 PM

I know of several methods to remove duplicate records but I recently encountered a unique situation where some duplicate records were actually acceptable.

Here is my situation:

I have a table that contains records of individuals who have children so 1 person can have 3 children with different birthdates; but there is also a field that has a specified language. The challenge arises when an individual may have only 1 child but has entered a single record twice. Once with a specifed language and again without which produces a default value of UNKNOWN.

I need to be able to remove this record without affecting records that may have a record entered twice as well; having 2 children for instance but they may also have a specified language in one record but a default value of UNKNOWN for their second record.

So I can't eliminate the unwanted duplicates by filtering out records that have UNKNOWN because I would also remove individuals that I need.

EX:
firstN | lastN | address | lang | childs birthdate
John Doe 210 Somewhere Ave ENG 1993-10-09
John Doe 210 Somewhere Ave UNK 1993-10-09
Jane Doe 210 Anywhere Ave ENG 1969-12-23
Jane Doe 210 Anywhere Ave UNK 1958-04-15

How could you remove the duplicate for John in this example without affecting Janes duplicate record which is actually ok because she apparently has 2 children with different birthdates whereas John's duplicate record is obviously created because it was entered twice; once without entering
a language and the second time specifying the language?

I have tried a number of things short of creating a cursor which isn't really the best way to resolve this issue since there are millions of rows.

Anyone out there have any input that be helpful? Or has anyone ever had this similar issue? I would be interested in knowing how you addressed the problem.

Any help is appreciated.

**nosepicker** · 11-21-2007, 11:47 AM

Would something like this work for you (just change the SELECT to a DELETE)?

SELECT A.*
FROM YourTable A JOIN YourTable B
ON A.firstN = B.firstN
AND A.lastN = B.lastN
AND A.address = B.address
AND A.birthdate = B.birthdate
AND A.lang <> B.lang
AND A.lang = 'UNK'

**mindscape** · 11-22-2007, 09:57 AM

Thanks Nosepicker that actually would work real good and you made it so painstakingly simple that I feel a bit stupid for not looking at the obvious.

Guess I was focusing too much on eliminating duplicates under normal circumstances and didn't step back and look at the solution from a different angle.

Appreciate the guidance.
Thx much.

Thread: Separating desired from undesired duplicate records

Thread Tools

Display

Separating desired from undesired duplicate records

Separating desired from undesired duplicate records

Posting Permissions