A very slow query

yana · 10-12-2000, 08:26 PM

Hi,

I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query.
I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.

Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies.
For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:

date returns
---- -------
8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00)
8/02/00 15 (15 users returned between 7/3/00 and 8/02/00)
.
.
.
8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)

Here is my query:

SELECT [date],
(SELECT count(distinct user_id)
FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
AND u.[date]>[user].date_created
GROUP BY usage.[date])returns
FROM usage
WHERE [date] BETWEEN @date1 AND @date2

This query works fine, but too slow. We use MS SQL server 7.0.

Thank you,
Yana

Jim W · 10-13-2000, 08:25 AM

I'm a bit confused by what this subquery is trying to do:

SELECT count(distinct user_id)
FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
AND u.[date]>[user].date_created
GROUP BY usage.[date]

the u.[date] field seems to be the usage table, so you are querying the usage date being between the usage date-30 and the usage date. That might be a typo when you copied it to this board though.

Basically with your subquery, you are going to have a query run for every date in the date range. You can try speeding it up either by running the index analyser on the subquery, or by eliminating the subquery entirely.

From your query, it seems that the usage table has one row for every date and user.

Something like this might work faster:

set nocount on

select date, uu.userid
into #temp
from usage u
join users uu on uu.userid = u.userid
and u.createdate between uu.date and dateadd(dd, uu.date, 30)
WHERE u.date BETWEEN @date1 AND @date2

set nocount off
select date, count(*)
from #temp
group by date
order by date

drop table #temp

You will only get the dates with more than 0 but that was somewhat true with your query too. Your query returns any date that -anyone- opened a document. This returns only dates that has a valid user (one within 30 days of creation)

------------
yana at 10/12/00 9:26:33 PM

Hi,

I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query.
I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.

Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies.
For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:

date returns
---- -------
8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00)
8/02/00 15 (15 users returned between 7/3/00 and 8/02/00)
.
.
.
8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)

Here is my query:

SELECT [date],
(SELECT count(distinct user_id)
FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
AND u.[date]>[user].date_created
GROUP BY usage.[date])returns
FROM usage
WHERE [date] BETWEEN @date1 AND @date2

This query works fine, but too slow. We use MS SQL server 7.0.

Thank you,
Yana

yana · 10-13-2000, 12:39 PM

Hi Jim,

Thank you very much for your help.
It works much faster now.

Yana
------------
Jim W at 10/13/00 9:25:22 AM

I'm a bit confused by what this subquery is trying to do:

SELECT count(distinct user_id)
FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
AND u.[date]>[user].date_created
GROUP BY usage.[date]

the u.[date] field seems to be the usage table, so you are querying the usage date being between the usage date-30 and the usage date. That might be a typo when you copied it to this board though.

Basically with your subquery, you are going to have a query run for every date in the date range. You can try speeding it up either by running the index analyser on the subquery, or by eliminating the subquery entirely.

From your query, it seems that the usage table has one row for every date and user.

Something like this might work faster:

set nocount on

select date, uu.userid
into #temp
from usage u
join users uu on uu.userid = u.userid
and u.createdate between uu.date and dateadd(dd, uu.date, 30)
WHERE u.date BETWEEN @date1 AND @date2

set nocount off
select date, count(*)
from #temp
group by date
order by date

drop table #temp

You will only get the dates with more than 0 but that was somewhat true with your query too. Your query returns any date that -anyone- opened a document. This returns only dates that has a valid user (one within 30 days of creation)

------------
yana at 10/12/00 9:26:33 PM

Hi,

I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query.
I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.

Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies.
For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:

date returns
---- -------
8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00)
8/02/00 15 (15 users returned between 7/3/00 and 8/02/00)
.
.
.
8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)

Here is my query:

SELECT [date],
(SELECT count(distinct user_id)
FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id]
WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date]
AND u.[date]>[user].date_created
GROUP BY usage.[date])returns
FROM usage
WHERE [date] BETWEEN @date1 AND @date2

This query works fine, but too slow. We use MS SQL server 7.0.

Thank you,
Yana

Thread: A very slow query

Thread Tools

Display

A very slow query

A very slow query (reply)

A very slow query (reply)

Posting Permissions