Comparing and filtering??

**MAW** · 09-01-2003, 08:41 AM

Hello all,

I'm new to SQL Server so if the following sounds stupid then apologies. I am trying to design a query which compares columns and filters according to the result of the comparison. We are a UK based charity that provides financial help to families with disabled children (www.familyfund.org.uk). We have a large database (250,000 entries) which we know contains some duplicate/split files from a recent migration. We need to identify these files but not delete them. Currently I am using the following:

SELECT dbo.Families.famId, dbo.Address.street, dbo.Children.childId, dbo.Address.postcode
FROM dbo.Children INNER JOIN
dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
GROUP BY dbo.Children.childId, dbo.Address.postcode, dbo.Families.famId, dbo.Address.street
HAVING (dbo.Address.street IS NOT NULL)
ORDER BY dbo.Address.street

Obviously this returns all 250,000 records and then we have to search manually. We would like to run a query which compares families.famID to address.street so that where famId has more than one address attched it is returned to the results grid. Does this make sense? is it possible? Any help would be gratefully received

Thanks in advance
Mark

**cvandemaele** · 09-02-2003, 06:09 AM

Not sure I got the question right, but try this

SELECT count(*), dbo.Families.famId, dbo.Address.street
FROM dbo.Children INNER JOIN
dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
GROUP BY dbo.Families.famId, dbo.Address.street
HAVING count(*) > 1
ORDER BY dbo.Address.street

**MAW** · 09-02-2003, 06:25 AM

Thanks for this but maybe I need to clarify. The split files can have seperate ID's eg:

famID Street
123456 55 James Street
654321 55 James Street

Where the street is the same for 2 different famId's I need to identify both the famId's in the results grid?

Hope this clarifies!

Thanks again
Mark

**cvandemaele** · 09-02-2003, 07:17 AM

Before going further, a small question :

In the example you give

famID Street
123456 55 James Street
654321 55 James Street

do both "55 James Street" have the same street.id or not ?

CVM.

**MAW** · 09-02-2003, 07:37 AM

Hi,

What we have is a relational dbase of families some of who have morethan one child so we could have something like:

famID Street
123456 55 James Street
123456 55 James Street
123456 55 James Street

which is valid as the family can have 3 children (i will probably add childId into the final query).

we could also have:

famID Street
123456 55 James Street
123456 55 James Street
123456 55 James Street
654321 55 James Street

The 654321 famId would be a duplicate/split file as we only register families under 1 unique address (eg. 55 James Street).

I would ideally like the query to return those unique addresses that have more than one famID allocated to them - apologies as I probably didnt explain this clearly in my first post but hope that this clarifies it a bit more

Thanks again
Mark

**cvandemaele** · 09-02-2003, 07:53 AM

Hi again,

I typed a little too fast in my previous response. The question was not about street.id being unique (as there is no such thing as a "Street" table, but address.id.

Anyway, a little subquery might do the trick :

/* your original query without group by clause */
SELECT dbo.Families.famId, dbo.Address.street, dbo.Children.childId, dbo.Address.postcode
FROM dbo.Children INNER JOIN
dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
/* additional where clause */
AND dbo.address.street in
/* start subselect */
(SELECT dbo.Address.street
FROM dbo.Children INNER JOIN
dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
GROUP BY dbo.Address.street
HAVING count(*) > 1)
/* end subselect */
ORDER BY dbo.Address.street

With a bit of luck, we might even end this thread today.

CVM.

**MAW** · 09-02-2003, 09:07 AM

Hi again,

I just tried this and i'm afraid it still returns all records (272,400!) not just the duplicates - sorry. I've tried tinkering with the joins but with no luck.

If its of any use the columns come from the following tables:

'famID' from 'dbo.families' (PK)
'street' from 'dbo.address'
'childId' from 'dbo.children' (PK)
'postcode' from 'dbo.address'

Sorry to be a pain but any help is greatly appreciated as I am new to SQL Server and i'm completely stumped!

Thanks again

Mark

**cvandemaele** · 09-02-2003, 09:42 AM

Well, well. Are we having fun or what ?

/* your original query without group by clause */
SELECT dbo.Families.famId, dbo.Address.street, dbo.Children.childId, dbo.Address.postcode
FROM dbo.Children INNER JOIN
dbo.Families ON dbo.Children.family_no = dbo.Families.famId INNER JOIN
dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
WHERE (NOT (dbo.Children.eligStatus IN (3, 4)))
/* additional where clause */
AND dbo.address.street in
/* start first subselect */
(select family_by_street.street from
/* start second subselect*/
(SELECT dbo.families.id as famid, dbo.Address.street as street
FROM dbo.Families INNER JOIN
dbo.Persons ON dbo.Families.famId = dbo.Persons.famId INNER JOIN
dbo.Address ON dbo.Persons.addressId = dbo.Address.addressId
GROUP BY dbo.families.famid, dbo.Address.street) as family_by_street
/* end second subselect */
group by family_by_street.street
HAVING count(*) > 1)
/* end first subselect */
ORDER BY dbo.Address.street

Now, the second subselect (you should test if this is correct) should return all distinct combinations of FamId, Street. I aliased the second subselect as "family_by_street".

The first subselect returns duplicate street-values from the second subselect. You can also test this.
The final select returns only rows having a matching street-value in the "dupes" streets.

You might have noticed that I dropped the join with the "Children" table in the second subselect, as it seems to me that the number of childrens per family is irrelevant.

CVM.

**MAW** · 09-04-2003, 03:36 AM

Hi there,

just been checking this out and it actually does more than we require which is GREAT, THANKS!!

It actually picks up all the anomalies, we still have to manually check the files but instead of 200,000 we now only have to check 19,000 of which we know about 10,000 are fine so it saves us a hell of lot time and energy!

Thanks again
Best regards
Mark

Thread: Comparing and filtering??

Thread Tools

Display

Comparing and filtering??

Posting Permissions