Can we use Analysis service to create a cube for huge records

**krajeshkumar** · 04-18-2007, 04:46 PM

Hi,

I am having a requirement to create a cube from a table which has more than 60 million records. The main purpose of creating a cube is to generate a report.
So please can any one suggest me weather i can create a Cube from this big table.

Hi,

Thanks a lot... Here are some more details

I am new to understand the cube.

I am trying to create a dimension on fact table itself. So in my case i am refering to same table for both fact and dimension.

This is my table
PRES_SK
PRES_DISP_SK
PRES_DISPENSER_DEA_NO
PRES_LOAD_SK
PRES_RX_NO
PRES_DEA_SUFFIX
PRES_CUSTOMER_ID
PRES_PATIENT_FIRST_NAME
PRES_PATIENT_LAST_NAME
PRES_PATIENT_DOB
PRES_SEX_CODE
PRES_PATIENT_STREET
PRES_PATIENT_STATE
PRES_PATIENT_ZIP
PRES_RECORD_NUMBER
PRES_DRUG_SK
PRES_NDC_CODE
PRES_DATE_FILLED
PRES_NEW_REFILL_CODE
PRES_QUANTITY
PRES_DAYS_SUPPLY
PRES_AUTHORIZED_REFILLS
PRES_COMPOUND_CODE
PRES_ISSUE_DATE
PRES_PRCT_SK
PRES_PRACTITIONER_DEA_NO
PRES_PAYMENT_TYPE
PRES_PHARMACY_EMAIL
PRES_PERIOD_FROM
PRES_PERIOD_TO
PRES_STATUS
PRES_DELETE
PRES_DATE_CREATED
PRES_USER_CREATED
PRES_DATE_MODIFIED
PRES_USER_MODIFIED
PRES_IP_ADDRESS
PRES_ALIAS_SK
PRES_ALIAS_DATE_CREATED
PRES_ALIAS_USER_CREATED
Prct_Dea_No
Prct_First_Name
Prct_Last_Name
Prct_City
Prct_State
Prct_Zip_Code
PRCT_ADDRESS
Disp_Dea_No
Disp_First_Name
Disp_Last_Name
Disp_City
Disp_State
Disp_Zip_code
DISP_ADDRESS
Drug_Dea_Class
Drug_Product_name
Drug_Manufacturer
FORM_QUANTITY
STRENGTH

The Fact is on
PRES_LOAD_SK

The dimensions are created based on
dim1
Prct_Dea_No
Prct_First_Name
Prct_Last_Name
Prct_City
Prct_State
Prct_Zip_Code
PRCT_ADDRESS

dim2
Disp_Dea_No
Disp_First_Name
Disp_Last_Name
Disp_City
Disp_State
Disp_Zip_code
DISP_ADDRESS

dim3
Drug_Dea_Class
Drug_Product_name
Drug_Manufacturer

dim4
PRES_ISSUE_DATE
PRES_PERIOD_FROM
PRES_PERIOD_TO
FORM_QUANTITY
STRENGTH
PRES_RX_NO

so based on these diminsion i am trying to fetch the details more over i am using Linked Server to fetch the details from this cube. this is the query i am using

Select * FROM OpenRowSet('MSOLAP','DATASOURCE=Dev2k3-2; USER ID=;
PASSWORD=;Initial Catalog=Maps_2006;','
Select
[PRES_PATIENT:Pres Patient First Name],
[PRES_PATIENT:Pres Patient Last Name],
[PRES_PATIENT:Pres Patient Address],
[PRES_QTY:PRES ISSUE DATE],
[PRES_QTY:Pres Date Filled],
[DRUG_NDC_CODE

rug Product Name],
[PRES_QTY:Form Quantity],
[PRES_QTY:STRENGTH],
[PRES_QTY:Pres Rx No],
[PRCT_DEA_NO:Prct FIrst Name],
[PRCT_DEA_NO:Prct Last Name],
[PRCT_DEA_NO:Prct State],
[PRCT_DEA_NO:Prct City],
[PRCT_DEA_NO:Prct Zip Code],
[DISP_DEA_NO

isp Last Name],
[DISP_DEA_NO

isp Dea No],
[DISP_DEA_NO

isp ADDRESS],
[DISP_DEA_NO

isp Zip Code],
[DRUG_NDC_CODE

RUG DEA CLASS]
from
[Prescriptions]
WHERE

[PRCT_DEA_NO:Prct Dea NO] = ''AW7444160''
')

Please help me to create a cube.

Rajesh Kumar

**Island1** · 04-18-2007, 08:26 PM

I find it difficult to believe that every row needs to be pulled into the cube. You don't indicate, either, if the large table is a fact or dim table (or whether you know the difference ...(.

You can filter the induction queries, etc., to make it more manageable. You can also partition with any degree of control you need - even across multiple servers.

Just pulling in all the rows would very likely be a bad design for many reasons, however. (This sounds like a transaction table ...) Let us know as you proceed and have more specific questions about your design ... If you give a little more detail about the business need, I'm sure we can offer some worthwhile options ...

Good Luck.

Bill

**rmiao** · 04-19-2007, 07:57 AM

Yes, you can build cube from big fact table. But you need big server for that.

**krajeshkumar** · 04-19-2007, 08:34 AM

Hi,

Thanks a lot... Here are some more details

I am new to understand the cube.

I am trying to create a dimension on fact table itself. So in my case i am refering to same table for both fact and dimension.

This is my table
PRES_SK
PRES_DISP_SK
PRES_DISPENSER_DEA_NO
PRES_LOAD_SK
PRES_RX_NO
PRES_DEA_SUFFIX
PRES_CUSTOMER_ID
PRES_PATIENT_FIRST_NAME
PRES_PATIENT_LAST_NAME
PRES_PATIENT_DOB
PRES_SEX_CODE
PRES_PATIENT_STREET
PRES_PATIENT_STATE
PRES_PATIENT_ZIP
PRES_RECORD_NUMBER
PRES_DRUG_SK
PRES_NDC_CODE
PRES_DATE_FILLED
PRES_NEW_REFILL_CODE
PRES_QUANTITY
PRES_DAYS_SUPPLY
PRES_AUTHORIZED_REFILLS
PRES_COMPOUND_CODE
PRES_ISSUE_DATE
PRES_PRCT_SK
PRES_PRACTITIONER_DEA_NO
PRES_PAYMENT_TYPE
PRES_PHARMACY_EMAIL
PRES_PERIOD_FROM
PRES_PERIOD_TO
PRES_STATUS
PRES_DELETE
PRES_DATE_CREATED
PRES_USER_CREATED
PRES_DATE_MODIFIED
PRES_USER_MODIFIED
PRES_IP_ADDRESS
PRES_ALIAS_SK
PRES_ALIAS_DATE_CREATED
PRES_ALIAS_USER_CREATED
Prct_Dea_No
Prct_First_Name
Prct_Last_Name
Prct_City
Prct_State
Prct_Zip_Code
PRCT_ADDRESS
Disp_Dea_No
Disp_First_Name
Disp_Last_Name
Disp_City
Disp_State
Disp_Zip_code
DISP_ADDRESS
Drug_Dea_Class
Drug_Product_name
Drug_Manufacturer
FORM_QUANTITY
STRENGTH

The Fact is on
PRES_LOAD_SK

The dimensions are created based on
dim1
Prct_Dea_No
Prct_First_Name
Prct_Last_Name
Prct_City
Prct_State
Prct_Zip_Code
PRCT_ADDRESS

dim2
Disp_Dea_No
Disp_First_Name
Disp_Last_Name
Disp_City
Disp_State
Disp_Zip_code
DISP_ADDRESS

dim3
Drug_Dea_Class
Drug_Product_name
Drug_Manufacturer

dim4
PRES_ISSUE_DATE
PRES_PERIOD_FROM
PRES_PERIOD_TO
FORM_QUANTITY
STRENGTH
PRES_RX_NO

so based on these diminsion i am trying to fetch the details more over i am using Linked Server to fetch the details from this cube. this is the query i am using

Select * FROM OpenRowSet('MSOLAP','DATASOURCE=Dev2k3-2; USER ID=;
PASSWORD=;Initial Catalog=Maps_2006;','
Select
[PRES_PATIENT:Pres Patient First Name],
[PRES_PATIENT:Pres Patient Last Name],
[PRES_PATIENT:Pres Patient Address],
[PRES_QTY:PRES ISSUE DATE],
[PRES_QTY:Pres Date Filled],
[DRUG_NDC_CODE

rug Product Name],
[PRES_QTY:Form Quantity],
[PRES_QTY:STRENGTH],
[PRES_QTY:Pres Rx No],
[PRCT_DEA_NO:Prct FIrst Name],
[PRCT_DEA_NO:Prct Last Name],
[PRCT_DEA_NO:Prct State],
[PRCT_DEA_NO:Prct City],
[PRCT_DEA_NO:Prct Zip Code],
[DISP_DEA_NO

isp Last Name],
[DISP_DEA_NO

isp Dea No],
[DISP_DEA_NO

isp ADDRESS],
[DISP_DEA_NO

isp Zip Code],
[DRUG_NDC_CODE

RUG DEA CLASS]
from
[Prescriptions]
WHERE

[PRCT_DEA_NO:Prct Dea NO] = ''AW7444160''
')

Please help me to create a cube.

Rajesh Kumar

**Island1** · 04-23-2007, 01:03 PM

Hi:

You ask a couple of places for help in building / designing a cube, and then throw a read query into the mix. Are you asking how to go about designing the cube, based upon the fact table describe, or how to "fetch the details" from an existing cube? (Whether querying directly or via a linked server, the cube has to be designed properly first before querying same, as I'm sure you can appreciate).

Can we take it a single step at a time, so we can give you some useful guidance?

Thanks!

Bill

**krajeshkumar** · 04-23-2007, 03:56 PM

Thanks for your reply, As you have said we will take single step at a time.

As i have show the dim1, dim2... are the dim that i have create. My problem is that if i try to fetch the details from the cube using linked server, for fetching details from the cube takes very very long time. So i need your help to redesign the cube or to figure out a way thru which the performance will be improved.

Thanks in advance.

**Island1** · 04-23-2007, 08:04 PM

Ah, now we get to the real question. "Creating a cube" has evolved to optimizing a query - and I assume asking for guidance in modifying cube structure to optimize run time for that specific query; this kind of leads into part of my response ...

Query optimizaton - be it a direct or linked-server query - is difficult to do without some detail about the cube. What we appear to have here are dimension names, a descripton of a very large fact table, and little else. No idea of hierarchical / attribute structures, no precisely identified measures (I can guess from the list you give, but that might be iffy), etc. We need to focus on aggregation and other settings in the cube to make sure that the cube itself is optimal, before adding the linked-server variable to remove complicating factors.

I would suggest beginning with a direct query on a local machine to eliminate variables. (I have suggestions on how to optimize the query from the "direct" scenario, once you get there ...). Once the cube is optimal there, you would deploy that design to the linked server and begin again with the query as you originally frame it - if you get the query optimal in a direct situation, you can then begin working with it as close to optimal as possible, in the linked server setting, so as to free you to focus on the connectivity issues, etc. - if indeed they still exist (they may not, once the query has been optimized in its direct context) - and if, indeed, the linked-server scenario was every truly necessary from the start ...

If the response is "I have no access / capability of working with a direct query against the cube," and / or "I have no access to the cube (on another server), and have to do all my optimization 'remotely,'" then we have a pretty fundamental obstacle. This is a terrible scenario from which to try to optimize what might be a bad cube design - and especially within the context of a single query that you deem "too slow." You'll never be able to know if the query is "taking too long" or not, as you will have no frame of reference for an optimally performing query.

(Another thing that strikes me is that result dataset of this query would make a report that is challenging to use in a productive way ... ia filtering out of the question (at least in the design stages) - filters that might be parameterized in the end report to make it more consumer friendly anyway?)

Does that make sense? Can you get a handle on optimization with a direct query first? An approach might be to go with a representative subset of the data in a local server scenario, if the data size is the reason you will say you cannot work with the cube locally ...

We'll work through it, provided you have some control over all the pieces ... let me know if you can set up a local "lab" environment where you can eliminate the complications first, as I have tried to describe.

Thanks.

Bill

Thread: Can we use Analysis service to create a cube for huge records

Thread Tools

Display

Can we use Analysis service to create a cube for huge records

SSAS Should Handle It ... But Think This Out ...

I Must Be Missing Something ...

way to increase Performance

Step Number One is to Remove Variables - and Get More Details

Posting Permissions