Results 1 to 9 of 9

Thread: Comparing and filtering??

  1. #1
    Join Date
    Sep 2003
    Posts
    5

    Question Comparing and filtering??

    Hello all,

    I'm new to SQL Server so if the following sounds stupid then apologies. I am trying to design a query which compares columns and filters according to the result of the comparison. We are a UK based charity that provides financial help to families with disabled children (www.familyfund.org.uk). We have a large database (250,000 entries) which we know contains some duplicate/split files from a recent migration. We need to identify these files but not delete them. Currently I am using the following:

    SELECT dbo.Families.famId, dbo.Address.street, dbo.Children.childId, dbo.Address.postcode
    FROM dbo.Children INNER JOIN
    dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
    dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
    dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
    WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
    GROUP BY dbo.Children.childId, dbo.Address.postcode, dbo.Families.famId, dbo.Address.street
    HAVING (dbo.Address.street IS NOT NULL)
    ORDER BY dbo.Address.street

    Obviously this returns all 250,000 records and then we have to search manually. We would like to run a query which compares families.famID to address.street so that where famId has more than one address attched it is returned to the results grid. Does this make sense? is it possible? Any help would be gratefully received

    Thanks in advance
    Mark

  2. #2
    Join Date
    Jan 2003
    Location
    Switzerland
    Posts
    14
    Not sure I got the question right, but try this

    SELECT count(*), dbo.Families.famId, dbo.Address.street
    FROM dbo.Children INNER JOIN
    dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
    dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
    dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
    WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
    GROUP BY dbo.Families.famId, dbo.Address.street
    HAVING count(*) > 1
    ORDER BY dbo.Address.street

  3. #3
    Join Date
    Sep 2003
    Posts
    5
    Thanks for this but maybe I need to clarify. The split files can have seperate ID's eg:

    famID Street
    123456 55 James Street
    654321 55 James Street

    Where the street is the same for 2 different famId's I need to identify both the famId's in the results grid?

    Hope this clarifies!

    Thanks again
    Mark

  4. #4
    Join Date
    Jan 2003
    Location
    Switzerland
    Posts
    14
    Before going further, a small question :

    In the example you give

    famID Street
    123456 55 James Street
    654321 55 James Street

    do both "55 James Street" have the same street.id or not ?

    CVM.

  5. #5
    Join Date
    Sep 2003
    Posts
    5
    Hi,

    What we have is a relational dbase of families some of who have morethan one child so we could have something like:

    famID Street
    123456 55 James Street
    123456 55 James Street
    123456 55 James Street

    which is valid as the family can have 3 children (i will probably add childId into the final query).

    we could also have:

    famID Street
    123456 55 James Street
    123456 55 James Street
    123456 55 James Street
    654321 55 James Street

    The 654321 famId would be a duplicate/split file as we only register families under 1 unique address (eg. 55 James Street).

    I would ideally like the query to return those unique addresses that have more than one famID allocated to them - apologies as I probably didnt explain this clearly in my first post but hope that this clarifies it a bit more

    Thanks again
    Mark

  6. #6
    Join Date
    Jan 2003
    Location
    Switzerland
    Posts
    14
    Hi again,

    I typed a little too fast in my previous response. The question was not about street.id being unique (as there is no such thing as a "Street" table, but address.id.

    Anyway, a little subquery might do the trick :

    /* your original query without group by clause */
    SELECT dbo.Families.famId, dbo.Address.street, dbo.Children.childId, dbo.Address.postcode
    FROM dbo.Children INNER JOIN
    dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
    dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
    dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
    WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
    /* additional where clause */
    AND dbo.address.street in
    /* start subselect */
    (SELECT dbo.Address.street
    FROM dbo.Children INNER JOIN
    dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
    dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
    dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
    GROUP BY dbo.Address.street
    HAVING count(*) > 1)
    /* end subselect */
    ORDER BY dbo.Address.street

    With a bit of luck, we might even end this thread today.

    CVM.

  7. #7
    Join Date
    Sep 2003
    Posts
    5
    Hi again,

    I just tried this and i'm afraid it still returns all records (272,400!) not just the duplicates - sorry. I've tried tinkering with the joins but with no luck.

    If its of any use the columns come from the following tables:

    'famID' from 'dbo.families' (PK)
    'street' from 'dbo.address'
    'childId' from 'dbo.children' (PK)
    'postcode' from 'dbo.address'

    Sorry to be a pain but any help is greatly appreciated as I am new to SQL Server and i'm completely stumped!

    Thanks again
    Mark

  8. #8
    Join Date
    Jan 2003
    Location
    Switzerland
    Posts
    14
    Well, well. Are we having fun or what ?

    /* your original query without group by clause */
    SELECT dbo.Families.famId, dbo.Address.street, dbo.Children.childId, dbo.Address.postcode
    FROM dbo.Children INNER JOIN
    dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
    dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
    dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
    WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
    /* additional where clause */
    AND dbo.address.street in
    /* start first subselect */
    (select family_by_street.street from
    /* start second subselect*/
    (SELECT dbo.families.id as famid, dbo.Address.street as street
    FROM dbo.Families INNER JOIN
    dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
    dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
    GROUP BY dbo.families.famid, dbo.Address.street) as family_by_street
    /* end second subselect */
    group by family_by_street.street
    HAVING count(*) > 1)
    /* end first subselect */
    ORDER BY dbo.Address.street

    Now, the second subselect (you should test if this is correct) should return all distinct combinations of FamId, Street. I aliased the second subselect as "family_by_street".

    The first subselect returns duplicate street-values from the second subselect. You can also test this.
    The final select returns only rows having a matching street-value in the "dupes" streets.

    You might have noticed that I dropped the join with the "Children" table in the second subselect, as it seems to me that the number of childrens per family is irrelevant.

    CVM.

  9. #9
    Join Date
    Sep 2003
    Posts
    5
    Hi there,

    just been checking this out and it actually does more than we require which is GREAT, THANKS!!

    It actually picks up all the anomalies, we still have to manually check the files but instead of 200,000 we now only have to check 19,000 of which we know about 10,000 are fine so it saves us a hell of lot time and energy!

    Thanks again
    Best regards
    Mark

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •