Page 1 of 2 12 LastLast
Results 1 to 15 of 17

Thread: Survery DB Schema

  1. #1
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19

    Question Survery DB Schema

    I am trying to modify a survey DB schema created by OOP(Object Oriented Programming) snobs who had their heads so far up their ass(in my opinion).
    This is current schema created by them.

    Diagram 1 (OOP snobs version)
    http://forums.databasejournal.com/clear.gif

    Actually there aren't a several foreign keys as they are shown in the picture, but that's what they(OOP snobs) are doing to the table "Response". They are inserting question_collection_id and its name as well as question_id and its name to Response table. However, it was not ill-intended, "We just wanted to load the report quickly"(OOP snobs).
    But I really recommend this schema, Because of its simplicity.

    Diagram 2 (My version)
    http://forums.databasejournal.com/clear.gif

    Yet I haven't proven that my schema will provide report just as quickly as OOP snobs' version. As mentioned earlier, since they don't have foreign key constraints accordingly, I do find quite number of discrepancy on values of "question" columns between "Question" and "Response" tables and "question_collection" columns between "question_collection" and "Response" table. I can provide schedule task to eliminate those problems. But I really prefer my schema than theirs. So far, I can explain mine is better just because of its simplicity and DBA's gut instinct. I'd like to hear Superior Expert's opinion which schema is better and why one is better than the other.
    Last edited by DBAStriker; 12-12-2004 at 07:10 PM.

  2. #2
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19
    If you weren't able to see images posted in earlier post, here's a zip file that contains jpegs of schema.
    Attached Files Attached Files

  3. #3
    Join Date
    Feb 2003
    Posts
    1,048
    I have to agree with your assessment of where their heads were!!! Their database structure is very bad. Yours is much better.

    There is more to consider than just returning the report. The tables get locked while inserting and updating the table, so bad design can affect output in that manner as well.

    If you post sample queries run for the reports, we could probably provide you some fuel for your argument.

  4. #4
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19
    Thanks for your input, RawHide
    What I've shown you earlier was still simple version of what OOP snobs did. But actual thing is far more horrific.
    Following is their original.
    http://forums.databasejournal.com/clear.gif

    Object Oriented Program has gone wild in this. They wanted to reuse "question" in "question_collection", They wanted to reuse "variable" in "question". And I checked the record to if any items have been reused.... Guess what? It never happened!! So with this, if I want to retrieve survey data(In question_collection_type contains data like "banner click count", "movie view progress stats", "survey"), query look like this.
    Code:
    
    SELECT question_collection.question_collection_name, 
           question.question_name,
           customer.first_name,customer.last_name,
           response.response_value,
           response.responded_date
    FROM question_collection INNER JOIN question_collection_type ON question_collection.question_collection_type_id = question_collection_type.question_collection_type_id
    	              INNER JOIN question_collection_question ON question_collection.question_collection_id = question_collection_question.question_collection_id
    	              INNER JOIN question ON question_collection_question.question_id = question.question_id
    	              INNER JOIN variable ON question.variable_id = variable.variable_id
    	              INNER JOIN response ON response.variable_id = variable.variable_id
    	              INNER JOIN customer ON response.customer_id = customer.customer_id
    WHERE question_collection_type.question_collection_type = 'survey' and
         question_collection.question_collection_id ='xxxx-xxxx-xxxx-xxxxxx-xxxx'
    
    No shit, the report wasn't loading quick! We've got 7 tables joined here.
    So, those OOP snobs changed the "response" table to make the report load faster. Which was shown in previous post, but again, this is close to the truth.

    http://forums.databasejournal.com/clear.gif
    With this schema, code gets simpler.
    Code:
    
    SELECT question_collection.question_collection_name, 
           question.question_name,
           customer.first_name,customer.last_name,
           response.response_value,
           response.responded_date
    FROM response INNER JOIN customer ON response.customer_id = customer.customer_id
    	      INNER JOIN question_collection ON response.question_collection_id = question_collection.question_collection_id
    WHERE question_collection_type.question_collection_type = 'survey' and
         question_collection.question_collection_id ='xxxx-xxxx-xxxx-xxxxxx-xxxx'
    
    But as we agreed, I hate having "question_collection_id","question_id", etc.... double whammy crap in "response". We are not re-using variable, questions, question_collections anyway. And also I discovered interesting thing. Currently "response" table holds 1267999 rows and 22571 data pages. Then I copied that table with exact amount. Then modified the table with my schema(http://forums.databasejournal.com/clear.gif). It reduced 5992 data pages! Isn't that something? So far my main arsenal against OOP snobs to convince them to modify with my schema is :
    "Reponse" table is fast data growing table, With current schema it quickly fill up data storage. To keep data growth minimum is possible, using my schema is inevitable.

    But I am still not so sure how I defend myself on report load. With my schema is simple, but it would still require some table joins. Do you think I should create another tables just to store normalized version of "response" table?

  5. #5
    Join Date
    Feb 2003
    Posts
    1,048
    If you're joining the 7 tables on integer id fields that are all indexed, then twice that many tables shouldn't be a speed issue.

    This is going to slow you down. Avoid searching varchar fields whenever you can!!!

    WHERE question_collection_type.question_collection_type = 'survey'


    And what kind of datatype is your question_collection_id? A uniqueidentifier?

    and question_collection_id ='xxxx-xxxx-xxxx-xxxxxx-xxxx'

  6. #6
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19
    Thank you Rawhide.

    I prefer using integer as primary keys. I am not totally against it, but OOP snobs Looooooove uniqueidentifier.
    In their schema http://forums.databasejournal.com/clear.gif, all tables except "customer" and "question_collection_type" use uniqueidentifier as primary key.

    Also, why avoid searching varchar field? Even if it's indexed?

    Also, I noticed significant data page decrease changing from datetime datatype to smalldatetime. I think that it is important to keep data growth minimum as possible for table especially like "Response".

    Is uniqueidentifier quite expensive for data storage? It looks like it is more expensive than integer for sure....

  7. #7
    Join Date
    Feb 2003
    Posts
    1,048
    Searching on any type of string field is very slow, even if it's indexed.


    A uniqueidentifier uses 16 bytes of memory wheras an int only uses 4. More importantly than the space it uses, is that it is considerably slower than using an int field (see below). So you have to ask them which is more important to them, that their ID's are unique in the whole world (something that they probably think is "cool) or performance.

    According to BOL:

    The main advantage of the uniqueidentifier data type is that the values generated by the Transact-SQL NEWID function or the application GUID functions are guaranteed to be unique throughout the world.

    The uniqueidentifier data type has several disadvantages:

    • The values are long and obscure. This makes them difficult for users to type correctly, and more difficult for users to remember.

    • The values are random and cannot accept any patterns that may make them more meaningful to users.

    • There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on incrementing key values serially.

    • At 16 bytes, the uniqueidentifier data type is relatively large compared to other data types such as 4-byte integers. This means indexes built using uniqueidentifier keys may be relatively slower than implementing the indexes using an int key.

  8. #8
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19
    Thanks a lot for your help Rawhide.

    I really learned a lot from this discussion with you.

    I think I armored myself with good facts to go against my director who is also OOP snobs.
    I hope he accepts my schema.


    Thanks.

  9. #9
    Join Date
    Feb 2003
    Posts
    1,048
    If it helps your argument any, you can tell him that the person you talked to is both a dba and a developer.

  10. #10
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19

    sql_variant data type

    Rawhide,
    One more request for your opinion.

    In "response" table, I am thinking of using sql_variant data type instead of nvarchar(4000) in "response_value" column. Reason is that questions vary from asking for details description to selecting single select multiple question with integer value associated with it.

    After my analysis, it turned out that it is slightly more expensive with sql_variant to store than nvarchar(4000), but query performance looked to be significantly robust with sql_variant data type.

    Did you have any pro&con experience involving with sql_variant data type?
    Last edited by DBAStriker; 12-15-2004 at 04:48 PM.

  11. #11
    Join Date
    Feb 2003
    Posts
    1,048
    Can't really say. I've never found a good reason to use sql_variant. I don't really see any real benefit from using it, but it has a lot of special rules that make using it a pain in the rear.

  12. #12
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19
    Actually, I found it useful.

    I've created 2 indentical table with 1000000 rows of text values and 1000000 rows of integer values.
    Only difference between 2 tables is that one table stores those value as nvarchar(4000) and the other stores as sql_variant.

    Then I did search by
    1. integer
    2. text

    And here's the performance result.

    http://www.geocities.com/axkixx/SqlVariants.htm

    What do you think?

  13. #13
    Join Date
    Feb 2003
    Posts
    1,048
    What queries did you use for the test.

  14. #14
    Join Date
    May 2004
    Location
    Boston, MA
    Posts
    19
    I've created 2 tables
    Code:
    /****************
       Table 1
    *****************/
    CREATE TABLE TEST_1([id] int identity, data_value nvarchar(4000))
    
    /****************
       Table 2
    *****************/
    CREATE TABLE TEST_2([id] int identity, data_value sql_variant)
    Then inserted 1000000 rows of integer and 1000000 rows of text data to both table.

    Then tested following queries 4 times.
    Code:
    SELECT *
    FROM test_1
    WHERE data_value = '12345'
    
    SELECT *
    FROM test_2
    WHERE data_value = 12345
    
    
    SELECT *
    FROM test_1
    WHERE data_value = 'John'
    
    SELECT *
    FROM test_2
    WHERE data_value = 'John'

  15. #15
    Join Date
    Feb 2003
    Posts
    1,048
    Tell me this, what results do you get when you run the following queries against the tables. Not performance results, but the actual response from the query?

    SELECT count(*)
    FROM test_1
    WHERE data_value > 'John'

    SELECT count(*)
    FROM test_2
    WHERE data_value > 'John'



    You see a slight discrepency there, don't you???

    The rules for how numeric data is handled have suddenly been reversed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •