Results 1 to 5 of 5

Thread: Need help on SP to normalize data

  1. #1
    Join Date
    Mar 2003
    Posts
    4

    Question Need help on SP to normalize data

    We have a stored proc that runs nightly and takes 90 minutes to run.
    The record count in the source table is about 630,000 rows, and the destination table ends up with 765,000 rows.

    It basically normalizes data from one table into another. Because we can't track date changes in the source table, we have to repopulate the 2nd table nightly.

    One record of source data looks like this:
    (table A)
    Song_ID Terr Catalog Percent Fix Admin
    1204 1 GC62755\GC61378 500000\500000 ABCD|E\ABCD|E P10336|P1314\P10336|P1314

    The normalized data (4 records) looks like this:
    (table B)
    Song Catalog Admin Fix Terr Percent
    1204 GC61378 P10336 ABCD 1 500000
    1204 GC61378 P1314 E 1 500000
    1204 GC62755 P10336 ABCD 1 500000
    1204 GC62755 P1314 E 1 500000

    Is there a way that jumps out to parse this "multi-valued" data easier than using a bunch of string functions on each record (eg. charindex, substring,len)?

  2. #2
    Join Date
    Dec 2004
    Posts
    502
    Is there a fixed number of "denormalized" values for each column, or at least do you know if there is a maximum number of "denormalized" values? In other words, is there a fixed or maximum number of backslashes (\) and pipes (|) that each column can have? If so, I have a potential solution in mind, although it involves several steps.

  3. #3
    Join Date
    Mar 2003
    Posts
    4
    There's no fixed limit. The columns could be null, or have 1 value (no '\'), or multi valued & sub-valued. The '\' separates the values, and the '|' separates the sub-values. That's what makes this a very manual process, is you can't predcit what will be there. The format is consistent and follows rules.

  4. #4
    Join Date
    Mar 2003
    Posts
    4
    We have some song entries with up to 10 or 11 catalog ids.

  5. #5
    Join Date
    Nov 2002
    Location
    DE
    Posts
    246
    Maybe you should approach the problem from a different perspective?

    1) According to your initial post the ratio between source and destination records is about 630k:765k.

    2) If I understand the data correctly, the relation between source and destination is 1:n (each record in the source table produces at least one record in the destination table)

    3) Taken 1 & 2 into consideration means that maximum 135 thousand source records need to be split into more than 1 destination records. In percentage there are only 21% of the records where you need to apply your split statement.

    If you optimize your split statement by let's say 10% it will only have an overall effect of 2%.
    Just an idea, but maybe it helps looking at the problem from a different perspective?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •