Results 1 to 4 of 4

Thread: representative data sample

  1. #1
    Join Date
    Apr 2005
    Posts
    3

    representative data sample

    Hello,

    Can someone tell me what is a representative data sample size (i.e. For a database with 10,000 records, what # constitutes the minimum # of records needed to provide a representative data sample for testing) We are testing an application that uses an Oracle database?
    The production data sets are too large and have too many records to test with.

    Thanks.

    Jerry

  2. #2
    Join Date
    Mar 2003
    Posts
    468
    Jerry,
    couple of things.
    1. 10K rows is not much and may be a good sample size.
    2. your sample size should represents those conditions you are trying to test. for example if you are testing exsistance of a primary key then your sample may just be one row. it really depends.
    3. if you are testing performance or growth patterns then your sample size may be more than the 10K rows and should be in line with how your database will grow over time and thus your sample size should grow.
    4. also need to answer the question of what the data looks and your sample should be representitive of that subset.

  3. #3
    Join Date
    Apr 2005
    Posts
    3

    Followup question

    James,

    Thank you for your prompt reply.

    The example that I used in my original posting probably was not a good one.
    Here is our specific situation.

    We have approx. 140K components, of 16 different types, each with multiple relationships. This information was parsed from multiple data sources and uploaded to an Oracle database.
    We are responsible for manually checking some percentage of the set of components to make sure the data in the source file equals the data in the database. This can only be done manually.
    What would be considered an acceptable size of a sample data set per component types, assuming the component number per type is equal?

    Thank you once again.

    Jerry

  4. #4
    Join Date
    Mar 2003
    Posts
    468
    I would suggest you do a quick search on the net against "sample size" and you will find a few places that help you calculate this variable.

    personally if all data is semi-uniform i usually go with 1-10% of the rows. this really depends on the amount of data. the larger the amount of data, the smaller my sample size.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •