Results 1 to 3 of 3

Thread: Separating desired from undesired duplicate records

  1. #1
    Join Date
    Oct 2007
    Posts
    15

    Separating desired from undesired duplicate records

    I know of several methods to remove duplicate records but I recently encountered a unique situation where some duplicate records were actually acceptable.

    Here is my situation:

    I have a table that contains records of individuals who have children so 1 person can have 3 children with different birthdates; but there is also a field that has a specified language. The challenge arises when an individual may have only 1 child but has entered a single record twice. Once with a specifed language and again without which produces a default value of UNKNOWN.

    I need to be able to remove this record without affecting records that may have a record entered twice as well; having 2 children for instance but they may also have a specified language in one record but a default value of UNKNOWN for their second record.

    So I can't eliminate the unwanted duplicates by filtering out records that have UNKNOWN because I would also remove individuals that I need.

    EX:
    firstN | lastN | address | lang | childs birthdate
    John Doe 210 Somewhere Ave ENG 1993-10-09
    John Doe 210 Somewhere Ave UNK 1993-10-09
    Jane Doe 210 Anywhere Ave ENG 1969-12-23
    Jane Doe 210 Anywhere Ave UNK 1958-04-15

    How could you remove the duplicate for John in this example without affecting Janes duplicate record which is actually ok because she apparently has 2 children with different birthdates whereas John's duplicate record is obviously created because it was entered twice; once without entering
    a language and the second time specifying the language?

    I have tried a number of things short of creating a cursor which isn't really the best way to resolve this issue since there are millions of rows.

    Anyone out there have any input that be helpful? Or has anyone ever had this similar issue? I would be interested in knowing how you addressed the problem.

    Any help is appreciated.

  2. #2
    Join Date
    Dec 2004
    Posts
    502
    Would something like this work for you (just change the SELECT to a DELETE)?


    SELECT A.*
    FROM YourTable A JOIN YourTable B
    ON A.firstN = B.firstN
    AND A.lastN = B.lastN
    AND A.address = B.address
    AND A.birthdate = B.birthdate
    AND A.lang <> B.lang
    AND A.lang = 'UNK'

  3. #3
    Join Date
    Oct 2007
    Posts
    15

    Thumbs up Separating desired from undesired duplicate records

    Thanks Nosepicker that actually would work real good and you made it so painstakingly simple that I feel a bit stupid for not looking at the obvious.

    Guess I was focusing too much on eliminating duplicates under normal circumstances and didn't step back and look at the solution from a different angle.

    Appreciate the guidance.
    Thx much.
    Last edited by mindscape; 11-22-2007 at 09:58 AM. Reason: typo in language

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •