Results 1 to 5 of 5

Thread: Table with 6 billion rows - WHERE on an non-indexed column

  1. #1
    Join Date
    May 2009
    Posts
    2

    Table with 6 billion rows - WHERE on an non-indexed column

    I have a huge table with 6 billion rows.
    The columns in the table are Field1, field2...
    Field1 contains data in the format of 'Arguments=Value1|Value2|Value3|Value4|Value5'

    Filed1 in the table is not indexed.
    I need to select from this table based on a specific value of Value 3.

    SELECT *
    FROM tablename E (nolock)
    WHERE
    CASE
    WHEN charindex('|',E.field1)+1 > 0 AND CHARINDEX('|', E.field1, CHARINDEX('|', E.field1)+1) - charindex('|',E.field1) -1 > 0
    THEN substring(E.field1,charindex('|',E.field1)+1,CHARI NDEX('|', E.field1,CHARINDEX('|', E.field1)+1) - charindex('|',E.field1) -1)
    ELSE '0'
    END in ('2401072700','2264935814');

    May I know if this SQL is OK to be run?
    Will the transaction log file grow enormously as the query is running?
    What might the other issues which i need to take care of?

  2. #2
    Join Date
    Sep 2002
    Posts
    5,938
    Select statement dosn't write anything to log.

  3. #3
    Join Date
    Apr 2009
    Posts
    86
    Yes. The SQL is okay to run. Just start it before you go an a 2 week vacation and it might be finished by the time you get back.

    Issues you need to take care of.

    1) Redesign the table. A Column should contain a single value. Storing multiple values in a single column is not only bad design, you can't do anything to increase performance. The reason there is no index on the table is because an Index would be worthless.

    At the least there should be 6 columns. Value1 in Col1, Value 2 in Col2, etc. Once this is done you can create an index on Col1 and/or Col2 and or Col3, etc.

    2) see number 1.

  4. #4
    Join Date
    May 2009
    Posts
    2
    Thanks SDas.
    This is the exactly what I expected.
    I know that this is a bad design. but we have to live with it as it is being served as product by one of the external vendors. We donot have a say on to change the table design.

    But can you please tell me how do I convince the people around me about the fact that it will take 2 weeks.
    I am new to SQL server.
    However, I have worked in other databases.

    1)
    Is there an Explain plan in SQL server which can estimate and tell upfront that it will take 2 weeks (or more) to return the rows?
    Can the explain plan be run without actually executing the query.

    2)
    When the SQL is executed, Will the transaction log grow up in enormous size or will it be at the same size as before the SQL is executed?
    I ran a SELECT statement the other day and the transaction log grew in size. But then there was an INNER JOIN in that
    SQL between 2 big tables of 6 billion rows.(one table was the same as above)
    Now in this SQL, there is no JOIN. Will this SQL still consume the same space as the transaction log file as earlier?


    3)
    The other thing is that the very same table is populated on a daily basis. Once in every 24 hours, new records are being
    inserted into the table. About 10 million every day.
    Assuming that just in case, I run the SQL and there is a NO LOCK as well in the SQL, will it affect the table loading or will it be a dirty read.. Can you please explain?

    Thanks

  5. #5
    Join Date
    Apr 2009
    Posts
    86
    Just to be clear, the '2 weeks' is a made up number. I know it will run a very long time but there is no real way to tell how long without actually running the query.

    1) Yes you can run an Estimated Execution Plan on the query without actually running the query. Since I don't know your SQL Server version, I can't tell you exactly how. In general, in the Management Console, where you can run the query, there should be an option under VIEW for Estimated Execution Plan. (I don't have SQL Server 2008 but can check 2005 and 2000). It would have to be a table space scan since there is no Index on Field1.

    2) As rmiao stated, the transaction log should only record transactions (Insert, Update, Delete). A Select would have no reason to write to the transaction log. If it was increasing in size while your other query was running, something else must have been causing it.

    3) I don't know enough about SQL Server to know how the NO LOCK and the 10 million row insert will react with each other (I am mainly a DB2 DBA with some knowledge about SQL Server).

    I am not sure what you mean by "... it is being served as product by one of the external vendors.". How is it being used?

    Would it be possible to create a multi-column table and create a view that concatenates the columns to replicate what the current table looks like?

    PS If nothing else, try an index on Field1. This might at least give you an index scan. Since more keys can fit on a page then data rows, there will be fewer pages read. If you can add all the rows being used by the query to the index (with FIELD1 first), you can get an Index only access and not have to read Index and data pages. (Or you can make the index on FIELD1 a Clustering Index.) Both of these should reduce the elapsed time but I can't begin to tell you how much.

    PPS How long does it take to Insert 10 million rows? Are rows ever deleted from the table or will it continue to grow by 50 million rows per week (assuming a 5 day week)? If nothing is deleted, whatever bad performance you currently have will just continue to get worse.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •