Results 1 to 3 of 3

Thread: Querying for duplicate fields

  1. #1
    Joy Guest

    Querying for duplicate fields

    This sounds simple but having trouble constructing the query - I'm trying to write a query that will pull out all the duplicate fields from a
    very huge database. There are around 200,000 records with a field called 'code' Code is a CHAR(9) field with no unique constraint.
    I'm just trying to check the data to make sureall the data there is unique.

    I ran 2 queries to check this:

    1. Select count(code) from TABLE
    RESULT (200,000 rows affected)

    2. Select distinct count(code) from TABLE
    RESULT (199,950 rows affected)

    So apparenlyt 50 codes are repeats of each other. I don't want to scroll down 200,000 records to determine which are duplicates. I tried group by but that still generates a whole
    list of (1's) which still requires me having to scroll down the page slowly to see if there is a number greater than 2.

    Any know of an easier way I can get this info?
    Joy

  2. #2
    Janet McQuade Guest

    Querying for duplicate fields (reply)




    ------------
    Joy at 5/26/99 11:40:45 AM

    This sounds simple but having trouble constructing the query - I'm trying to write a query that will pull out all the duplicate fields from a
    very huge database. There are around 200,000 records with a field called 'code' Code is a CHAR(9) field with no unique constraint.
    I'm just trying to check the data to make sureall the data there is unique.

    I ran 2 queries to check this:

    1. Select count(code) from TABLE
    RESULT (200,000 rows affected)

    2. Select distinct count(code) from TABLE
    RESULT (199,950 rows affected)

    So apparenlyt 50 codes are repeats of each other. I don't want to scroll down 200,000 records to determine which are duplicates. I tried group by but that still generates a whole
    list of (1's) which still requires me having to scroll down the page slowly to see if there is a number greater than 2.

    Any know of an easier way I can get this info?
    Joy



    How about:
    select CODE, count(*) from TABLE group by CODE having count(*) > 1
    Janet

  3. #3
    Andy hughes Guest

    Querying for duplicate fields (reply)

    Try something like this:

    select code, coumt(*)
    from table
    group by code
    having count(*) > 1

    This will give all codes which appear more than once.

    As for actually removing them.....

    This is never simple and will depend on the structure of the table and keys. Assuming you can't uniquely identify the rows to be deleted, you could use a separate table with an index set up to automatically dedupe records (check the unique and ignore dupplicate options when creating it). The constituents of the indexc will again depend on the data structure.

    You will then copy from source to dedupe, remove from source, copy from dedupe to.

    Hope this helps
    Andy
    ------------
    Joy at 5/26/99 11:40:45 AM

    This sounds simple but having trouble constructing the query - I'm trying to write a query that will pull out all the duplicate fields from a
    very huge database. There are around 200,000 records with a field called 'code' Code is a CHAR(9) field with no unique constraint.
    I'm just trying to check the data to make sureall the data there is unique.

    I ran 2 queries to check this:

    1. Select count(code) from TABLE
    RESULT (200,000 rows affected)

    2. Select distinct count(code) from TABLE
    RESULT (199,950 rows affected)

    So apparenlyt 50 codes are repeats of each other. I don't want to scroll down 200,000 records to determine which are duplicates. I tried group by but that still generates a whole
    list of (1's) which still requires me having to scroll down the page slowly to see if there is a number greater than 2.

    Any know of an easier way I can get this info?
    Joy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •