Results 1 to 3 of 3

Thread: A very slow query

  1. #1
    yana Guest

    A very slow query


    Hi,

    I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query.
    I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.

    Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies.
    For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:

    date returns
    ---- -------
    8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00)
    8/02/00 15 (15 users returned between 7/3/00 and 8/02/00)
    .
    .
    .
    8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)

    Here is my query:

    SELECT [date],
    (SELECT count(distinct user_id)
    FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
    WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
    AND u.[date]>[user].date_created
    GROUP BY usage.[date])returns
    FROM usage
    WHERE [date] BETWEEN @date1 AND @date2

    This query works fine, but too slow. We use MS SQL server 7.0.

    Thank you,
    Yana





  2. #2
    Jim W Guest

    A very slow query (reply)

    I'm a bit confused by what this subquery is trying to do:

    SELECT count(distinct user_id)
    FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
    WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
    AND u.[date]>[user].date_created
    GROUP BY usage.[date]

    the u.[date] field seems to be the usage table, so you are querying the usage date being between the usage date-30 and the usage date. That might be a typo when you copied it to this board though.

    Basically with your subquery, you are going to have a query run for every date in the date range. You can try speeding it up either by running the index analyser on the subquery, or by eliminating the subquery entirely.

    From your query, it seems that the usage table has one row for every date and user.

    Something like this might work faster:

    set nocount on

    select date, uu.userid
    into #temp
    from usage u
    join users uu on uu.userid = u.userid
    and u.createdate between uu.date and dateadd(dd, uu.date, 30)
    WHERE u.date BETWEEN @date1 AND @date2

    set nocount off
    select date, count(*)
    from #temp
    group by date
    order by date

    drop table #temp

    You will only get the dates with more than 0 but that was somewhat true with your query too. Your query returns any date that -anyone- opened a document. This returns only dates that has a valid user (one within 30 days of creation)

    ------------
    yana at 10/12/00 9:26:33 PM


    Hi,

    I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query.
    I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.

    Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies.
    For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:

    date returns
    ---- -------
    8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00)
    8/02/00 15 (15 users returned between 7/3/00 and 8/02/00)
    .
    .
    .
    8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)

    Here is my query:

    SELECT [date],
    (SELECT count(distinct user_id)
    FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
    WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
    AND u.[date]>[user].date_created
    GROUP BY usage.[date])returns
    FROM usage
    WHERE [date] BETWEEN @date1 AND @date2

    This query works fine, but too slow. We use MS SQL server 7.0.

    Thank you,
    Yana





  3. #3
    yana Guest

    A very slow query (reply)

    Hi Jim,

    Thank you very much for your help.
    It works much faster now.

    Yana
    ------------
    Jim W at 10/13/00 9:25:22 AM

    I'm a bit confused by what this subquery is trying to do:

    SELECT count(distinct user_id)
    FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
    WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
    AND u.[date]>[user].date_created
    GROUP BY usage.[date]

    the u.[date] field seems to be the usage table, so you are querying the usage date being between the usage date-30 and the usage date. That might be a typo when you copied it to this board though.

    Basically with your subquery, you are going to have a query run for every date in the date range. You can try speeding it up either by running the index analyser on the subquery, or by eliminating the subquery entirely.

    From your query, it seems that the usage table has one row for every date and user.

    Something like this might work faster:

    set nocount on

    select date, uu.userid
    into #temp
    from usage u
    join users uu on uu.userid = u.userid
    and u.createdate between uu.date and dateadd(dd, uu.date, 30)
    WHERE u.date BETWEEN @date1 AND @date2

    set nocount off
    select date, count(*)
    from #temp
    group by date
    order by date

    drop table #temp

    You will only get the dates with more than 0 but that was somewhat true with your query too. Your query returns any date that -anyone- opened a document. This returns only dates that has a valid user (one within 30 days of creation)

    ------------
    yana at 10/12/00 9:26:33 PM


    Hi,

    I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query.
    I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.

    Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies.
    For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:

    date returns
    ---- -------
    8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00)
    8/02/00 15 (15 users returned between 7/3/00 and 8/02/00)
    .
    .
    .
    8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)

    Here is my query:

    SELECT [date],
    (SELECT count(distinct user_id)
    FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
    WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
    AND u.[date]>[user].date_created
    GROUP BY usage.[date])returns
    FROM usage
    WHERE [date] BETWEEN @date1 AND @date2

    This query works fine, but too slow. We use MS SQL server 7.0.

    Thank you,
    Yana





Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •