Results 1 to 10 of 10

Thread: Interesting challenge

  1. #1
    Join Date
    Jul 2005
    Posts
    11

    Interesting challenge

    We're having to work with some legacy data. The tables in the so-called database seem to have way more nulls than actual data. One table appears to have around 100 or more columns in it. It has close to 40,000 rows.

    This Db has pretty much 0 normalization present.

    IOW, your worst nightmare.

    Is there a way we could run a query that would return the total number cells inthat contain NULL and another that could return the total number of data-bearing cells so we could come up with a % or a ratio.

  2. #2
    Join Date
    Sep 2002
    Location
    Fantasy
    Posts
    4,254
    When you say legacy data, how does the format of that look like?

    you can import that to a SQL database and run all the queries. it has only 40,000 rows. it wont take that long to import if the source file is in a standard consistent format.

  3. #3
    Join Date
    Jul 2005
    Posts
    2
    If you get the data into mssql server, then you can use the combination of sysobjects and syscolumns to write a dynamic query to find out how many nulls there are. Its not a trivial thing but its not too hard either.

  4. #4
    Join Date
    Jul 2005
    Posts
    1

    Simple solution

    Hi Chaz. Over the years I've converted many databases to SQL Server. Often it's better to think simpler. I use Access as my primary data conversion tool for tables of less than say 500,000 rows.

    It is easier and faster to create queries and software to manipulate and correct data. You can transfer legacy data to new normalised tables, then upload the tables to SQL Server - or just use link tables. The only thing you have to watch is the use of primary keys when matching old to new (identity) keys.

    Anyway, your problem will take about 10 minutes to solve in Access. Simply import/link the data, create an access query, select each column into the analyser, then use the "Totals/Count" option for each column.

    I just tested it for 60+ columns and it works fine (about 3 minutes for me to dummy up and test the solution - took longer to register and submit this comment.)

    You take the results, export to excel (which is of course even simpler for stats analysis), then divide the total by (number of rows * the number of columns) - or give a total against each column, etc.

    Probably no more 10 minutes to get the result.

    Main point is that SQL Server is not the only tool available to assist in data conversions.

    Cheers & Good luck,

    David Maxwell
    Last edited by davidms; 07-12-2005 at 10:18 PM.

  5. #5
    Join Date
    Sep 2002
    Posts
    5,938
    Does so-called db have so-called query? If so, you should be able to analyze data in where it is.

  6. #6
    Join Date
    Jul 2005
    Posts
    11
    Thanx for all the responses.

    The Db in question is on SQL Server 2000. It has to be the worst, gnarliest, most cocked-up DB I've ever seen. I'm amazed that the goofy thing works at all. It would be funny were it not for the the importance and sensitivity of the data it contains.

    One table I mentioned has around 38k rows and around 100 columns. The ratio of NULL fields to fields that actually contain data run over 10:1 (a formula for which is why I originally posted). This DB is full of similar tables

    We'd like to make a case for blocking sufficient time and resources for a complete, ground-up rebuild of the Db to properly normalize it, and hopefully improve the performance while reducing the system resources currently required to support it.

  7. #7
    Join Date
    Sep 2002
    Location
    Fantasy
    Posts
    4,254
    --This will generate a list of sql statements. When those statements gets executed look for results with Zero. THose are the columns which has no data.

    select 'Select "'+name+'",count(*) as rows from authors where '+name+' is not null' from syscolumns where id=object_id('authors')

  8. #8
    Join Date
    Sep 2002
    Location
    Fantasy
    Posts
    4,254
    please change the 'authors' to your table name.

  9. #9
    Join Date
    Jul 2005
    Posts
    1
    expanding Mak's script you could do

    select 'Select "'+name+'",count(*) as total,count('+name+') as nonNull,count(*)-count('+name+') as Nulls from authors' from syscolumns where id=object_id('authors')

    G

  10. #10
    Join Date
    Jul 2005
    Posts
    11
    ... or expanding (too) much more to do it just in one opperation:

    Code:
    declare @ColumnName varchar(100), @string nvarchar(4000), @TableName sysname
    create table #xxx (ColumnName varchar(100), total int, nonNull int, Nulls int, Ratio decimal(5,2))
    
    set @TableName = 'publishers'
    
    declare ccc cursor for
    	select name from syscolumns where id=object_id(@TableName)
    
    open ccc
    fetch next from ccc into @ColumnName
    
    while @@fetch_status = 0
    begin
    	set @string = 'insert into #xxx
    			Select ''' + @ColumnName + ''' as ColumnName, 
    			count(*) as total,
    			count(' + @ColumnName + ') as nonNull,
    			count(*)-count(' + @ColumnName + ') as Nulls,
    			cast(count(' + @ColumnName + ') as decimal(5,2)) / 
    				cast(count(*) as decimal(5,2)) 
    			as Ratio
    			from ' + @TableName
    
    	exec sp_executesql @string
    	fetch next from ccc into @ColumnName
    end
    
    close ccc
    deallocate ccc
    
    select 	CASE WHEN (GROUPING(ColumnName) = 1) THEN 'TOTAL.............'
            	ELSE ColumnName
           	END AS ColumnName, 
    	sum(total) as total, 
    	sum(nonNull)as nonNull, 
    	sum(Nulls) as Nulls,
    	cast(avg(Ratio) as decimal(5,2)) as Ratio
    from #xxx
    group by ColumnName
    	WITH ROLLUP
    
    
    drop table #xxx
    The results will be:

    Code:
    ColumnName		total	nonNull	Nulls	Ratio 
    
    city			8	8	0	1.00
    country			8	8	0	1.00
    pub_id			8	8	0	1.00
    pub_name		8	8	0	1.00
    state			8	6	2	.75
    
    TOTAL.............	40	38	2	.95

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •