Results 1 to 4 of 4

Thread: Stock Prices Database Design

  1. #1
    Adam Sneller Guest

    Stock Prices Database Design

    I am building a database to hold 10 years of data for all stocks traded on the NYSE. The size of the project has raised some interesting questions as to exactly how I'm supposed to normalize it.

    The first model is to place the ticker symbol, company name, and description in a separate Tickers table, and group date, open, high, low, close, and volume in a joined data table. The only problem here, it the data table ends up being 250,000,000+ records long. I'm using mySQL, but even this crashes on the 1st query.

    The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, etc. This is then joined to the Ticker's table. This works, but make queries very difficult, as each date has to be named separately in the SELECT statement.

    The third model is to give each stock its own table, all joined to the Tickers table. This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

    Phew! I get the feeling I'm missing something obvious. Any suggestions?

  2. #2
    Popov Guest

    Stock Prices Database Design (reply)

    Use SQL Server


    ------------
    Adam Sneller at 7/13/01 12:22:50 AM

    I am building a database to hold 10 years of data for all stocks traded on the NYSE. The size of the project has raised some interesting questions as to exactly how I'm supposed to normalize it.

    The first model is to place the ticker symbol, company name, and description in a separate Tickers table, and group date, open, high, low, close, and volume in a joined data table. The only problem here, it the data table ends up being 250,000,000+ records long. I'm using mySQL, but even this crashes on the 1st query.

    The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, etc. This is then joined to the Ticker's table. This works, but make queries very difficult, as each date has to be named separately in the SELECT statement.

    The third model is to give each stock its own table, all joined to the Tickers table. This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

    Phew! I get the feeling I'm missing something obvious. Any suggestions?

  3. #3
    Patrick Guest

    Stock Prices Database Design (reply)

    You are facing 2 issues.
    1) It's necessary to develop a good design for your Database. In this part, we must temporary 'forget' the size issue. You work with the 3rd NF Etc. Regular design...

    2° the size issue must be faced. In your case, we could investigate in the 'table-slicing' I mean several (a lot) tables containing each a period of time like Month or week (a kind of time unit workable). then each table becomes workable and if queries require data inside a month we look at only one table. However the month size is not a solution, the quarter could be your good value.

    I hope this help you.
    Patrick


    ------------
    Popov at 7/22/01 1:06:58 AM

    Use SQL Server


    ------------
    Adam Sneller at 7/13/01 12:22:50 AM

    I am building a database to hold 10 years of data for all stocks traded on the NYSE. The size of the project has raised some interesting questions as to exactly how I'm supposed to normalize it.

    The first model is to place the ticker symbol, company name, and description in a separate Tickers table, and group date, open, high, low, close, and volume in a joined data table. The only problem here, it the data table ends up being 250,000,000+ records long. I'm using mySQL, but even this crashes on the 1st query.

    The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, etc. This is then joined to the Ticker's table. This works, but make queries very difficult, as each date has to be named separately in the SELECT statement.

    The third model is to give each stock its own table, all joined to the Tickers table. This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

    Phew! I get the feeling I'm missing something obvious. Any suggestions?

  4. #4
    Join Date
    Aug 2008
    Posts
    1
    Adam,

    It's a long time ago that you posted here, but I am just curious what solution did you implemented? I'm currently hitting the same wall after 2Milion quotes in one table it starts to get slower and I was thinking of the excellent suggestion of Patric.

    Wessel

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •