Results 1 to 2 of 2

Thread: Joins and Data Warehouse

  1. #1
    Join Date
    Jul 2004
    Posts
    17

    Joins and Data Warehouse

    {This message also posted in the Oracle forum}

    To follow the discussion, it's probably easier if you understand sports.

    In baseball, every event is tracked. In addition to events, you have a configuration (the identies of the players, the park, etc). If this was simply one game, I would create the tables in 3NF, and do the joins, and be on my way. However, I'm going to end up with a few million records, with the main key being the gameid (of which there'd be a few thousand).

    Suppose I'm interested in knowing who each player plays with at each event. (In a game, there's about 10-15 players that play for each team, so you are talking about getting a result back of games x events x 25squared.)

    My issue. Since I process each game individually, would it make more sense to do the join at that level? That is, my choice is:
    1 - read in 10,000 games, with 300 events each (= 3 million rows)
    2 - load them in 3NF tables
    3 - do massive join (resulting in 10,000 x 300 x 625 = 2 billion rows)
    OR
    1 - read in 1 game, 300 events each in 3NF tables
    2 - do massive join (resulting in 200,000 rows per game)
    3 - load in redundant tables, clear out the 3NF tables, and go to step 1 (doing this 10,000 times, for a total of 2 billion rows)


    The important thing to remember is that the minimum join key is the gameid. If I've got millions of records, but I know off the bat that almost all won't satisfy the self-join condition
    (e1.gameid = e2.gameid), wouldn't it make more sense to do the self-join upfront, before loading the data?

    Thanks, Tom
    Last edited by tmasc; 06-28-2006 at 09:07 PM.

  2. #2
    Join Date
    Jun 2006
    Posts
    4
    wouldn't each player have a foreign key to the event and the game and would you join on those?

    How often does the data change? daily? weekly?

    to avoid slow response from the query you can create a materialized view to query

    clp

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •