Results 1 to 9 of 9

Thread: Efficient Query Needed

  1. #1
    Join Date
    Jul 2004
    Posts
    17

    Efficient Query Needed: DISTINCT v GROUP BY v Subselects

    I will be running several hundred queries, and I'd like to make sure I get this as efficient as possible.

    Here are 3 different queries (which all supply the same solution) that I am considering, and would like an opinion as to:
    1 - which is more efficient
    2 - is there something else more efficient

    A)
    select count(distinct COL1) as Cardinality from TABLE1;

    B)
    select count(COL1) as Cardinality from (select distinct COL1 from TABLE1);

    C)
    select sum(distinctcol) as Cardinality from (select 1 as distinctcol from TABLE1 group by COL1);

    In my testing, B) is faster than A).

    I thought using GROUP BY (solution C) would be better than using DISTINCT (solution B), but that doesn't appear to be the case. I think the group by forces a sort, which I don't really need, and maybe that causes an inefficiency?

    Looking forward to hearing from the experts.

    Tom
    Last edited by tmasc; 08-05-2004 at 12:41 PM.

  2. #2
    Join Date
    Mar 2003
    Posts
    468
    if this is Oracle i personally would like to see autotrace on and the results of those queries.

    FYI, distinct also causes a sort.

    personally i would assume A would perform the fastes even though they are all producing sorts and seem from a birds eye view that they all have function calls that should evaluate to be the same.

    question:
    did you flush the cache before each of these executions?

    please send on the trace results.

  3. #3
    Join Date
    Feb 2003
    Posts
    1,048
    Interesting question. I rewrote your 3 queries in SQL Server. SQL Server outputs the exact same execution plan for all 3 queries. SQL Server executes teh query in the exact same manner for all 3 queries.

    The first time I ran one of the queries, it ran in 7 seconds. After that, no matter which query I ran, it took 2 seconds. Access may not be smart enough to figure out that they are all 3 basically the same query though.

    What did cause a noticeable difference in execution time was whether the field I used in place of Col1 was indexed or not. So make sure that you have an index on the field.

  4. #4
    Join Date
    Jul 2004
    Posts
    17
    I am at a client site, and I'm really only limited to doing SELECTs. Putting indexes is not an option, since these are prod tables, and again, I have nothing more than SELECT access.

    When I run something on my own database (with just simple test data), all 3 queries run right around the same. As I change my data so that I have different ratios of unique records to total records, again, not much difference.

    I was "warned" about the inefficiencies of DISTINCT, but, I don't see it in my tests. There must be some particular set of circumstances where DISTINCT does lose its efficiency, but it doesn't seem to apply to what I'm doing.

  5. #5
    Join Date
    Feb 2003
    Posts
    1,048
    I don't think that "inefficiencies" is the right term. It does add a certain amount of performance drain, but it's not like there's a more efficient way to do it when that is what you need to do.

    It looks like you are using Access. Is that correct? If not, what database are you using?

    How are you using the resulting recordsets? It may be more efficient to query for COL1 and then count the distinct values in your client-side code.

  6. #6
    Join Date
    Jul 2004
    Posts
    17
    I am using Oracle.

    Method c) does not use DISTINCT, but I decided to run one with DISTINCT just to see how bad DISTINCT really is. The result of that is what made me start this thread.

    The resulting recordset will not be used beyond that. I am using this information to create a data profile repository that I will be using for data migration.

  7. #7
    Join Date
    Feb 2003
    Posts
    1,048
    Well, I can't speak to Oracle. I've never used it. My expertise is in MS SQL Server, Access, and MySQL. But I think that certain generalities are true no matter what db you are using.

    If the query that you are using is efficient, yet the process is still slow, someone needs to look at the architecture. I realize that you can't do that yourself, but if your query is optimized to the best of your ability, then you have something to go to the client with regarding having someone look at their architecture.

  8. #8
    Join Date
    Jul 2004
    Posts
    17
    In one of the Oracle magazines, it shows the following:

    Remember that sorts occur whenever the following actions take place:


    -Index creation
    -Group by or order by statements
    -Use of the distinct operator
    -Join operations
    -Use of the union, intersect, and minus set operators

    So, this supports jkoopmann's statement, as well as my results, with DISTINCT and GROUP BY.

    Now, I'm looking at UNION, and that's making me think about whether it uses the sort. I can see by the above quote, and just thinking about it, that it should, since UNION is really UNION DISTINCT.

    But, what about UNION ALL? There should be no reason that it actually does a sort, since we're not trying to match anything. Running a test, and it certainly doesn't look like UNION ALL uses a sort, unlike UNION. And, the results I get from UNION ALL are not sorted in any case.

    ***

    The reason I ask is that I want to run:
    select 'COL1' as ELEMENT, count(distinct COL1) as Cardinality from TABLE1 UNION ALL

    select 'COL2' as ELEMENT, count(distinct COL2) as Cardinality from TABLE1
    ...

    I'm going to have n-hundred selects all being UNION ALLed, resulting in n-hundred rows.

    Essentially, I want to get the cardinality for each column, and spit it out in one result.

    (I could append the result into a table, but I don't have CREATE TABLE rights.)

    As far as I can tell, this should work exactly the same as issuing a select call, one after the other, with no extra performance impact.

    If there are any thoughts on other ways to accomplish this, please let me know. Otherwise, I'll be running this script.

  9. #9
    Join Date
    Feb 2003
    Posts
    1,048
    You are correct about union all not using a sort when it is processed.

    Does Oracle have an equivalent to the SQL Server table data type? If so, you can use a variable of data type table as if it was an actual table, but it's actually faster because the table is in memory and not written to the disk.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •