Need help on SP to normalize data

**mhavens** · 02-18-2005, 12:51 PM

We have a stored proc that runs nightly and takes 90 minutes to run.
The record count in the source table is about 630,000 rows, and the destination table ends up with 765,000 rows.

It basically normalizes data from one table into another. Because we can't track date changes in the source table, we have to repopulate the 2nd table nightly.

One record of source data looks like this:
(table A)
Song_ID Terr Catalog Percent Fix Admin
1204 1 GC62755\GC61378 500000\500000 ABCD|E\ABCD|E P10336|P1314\P10336|P1314

The normalized data (4 records) looks like this:
(table B)
Song Catalog Admin Fix Terr Percent
1204 GC61378 P10336 ABCD 1 500000
1204 GC61378 P1314 E 1 500000
1204 GC62755 P10336 ABCD 1 500000
1204 GC62755 P1314 E 1 500000

Is there a way that jumps out to parse this "multi-valued" data easier than using a bunch of string functions on each record (eg. charindex, substring,len)?

**nosepicker** · 02-18-2005, 02:05 PM

Is there a fixed number of "denormalized" values for each column, or at least do you know if there is a maximum number of "denormalized" values? In other words, is there a fixed or maximum number of backslashes (\) and pipes (|) that each column can have? If so, I have a potential solution in mind, although it involves several steps.

**mhavens** · 02-18-2005, 02:14 PM

There's no fixed limit. The columns could be null, or have 1 value (no '\'), or multi valued & sub-valued. The '\' separates the values, and the '|' separates the sub-values. That's what makes this a very manual process, is you can't predcit what will be there. The format is consistent and follows rules.

**mhavens** · 02-18-2005, 05:20 PM

We have some song entries with up to 10 or 11 catalog ids.

**andi_g69** · 02-19-2005, 05:30 AM

Maybe you should approach the problem from a different perspective?

1) According to your initial post the ratio between source and destination records is about 630k:765k.

2) If I understand the data correctly, the relation between source and destination is 1:n (each record in the source table produces at least one record in the destination table)

3) Taken 1 & 2 into consideration means that maximum 135 thousand source records need to be split into more than 1 destination records. In percentage there are only 21% of the records where you need to apply your split statement.

If you optimize your split statement by let's say 10% it will only have an overall effect of 2%.
Just an idea, but maybe it helps looking at the problem from a different perspective?

Thread: Need help on SP to normalize data

Thread Tools

Display

Need help on SP to normalize data

Posting Permissions