Results 1 to 7 of 7

Thread: Can we use Analysis service to create a cube for huge records

  1. #1
    Join Date
    Apr 2007
    Posts
    5

    Can we use Analysis service to create a cube for huge records

    Hi,

    I am having a requirement to create a cube from a table which has more than 60 million records. The main purpose of creating a cube is to generate a report.
    So please can any one suggest me weather i can create a Cube from this big table.

    Hi,

    Thanks a lot... Here are some more details

    I am new to understand the cube.

    I am trying to create a dimension on fact table itself. So in my case i am refering to same table for both fact and dimension.

    This is my table
    PRES_SK
    PRES_DISP_SK
    PRES_DISPENSER_DEA_NO
    PRES_LOAD_SK
    PRES_RX_NO
    PRES_DEA_SUFFIX
    PRES_CUSTOMER_ID
    PRES_PATIENT_FIRST_NAME
    PRES_PATIENT_LAST_NAME
    PRES_PATIENT_DOB
    PRES_SEX_CODE
    PRES_PATIENT_STREET
    PRES_PATIENT_STATE
    PRES_PATIENT_ZIP
    PRES_RECORD_NUMBER
    PRES_DRUG_SK
    PRES_NDC_CODE
    PRES_DATE_FILLED
    PRES_NEW_REFILL_CODE
    PRES_QUANTITY
    PRES_DAYS_SUPPLY
    PRES_AUTHORIZED_REFILLS
    PRES_COMPOUND_CODE
    PRES_ISSUE_DATE
    PRES_PRCT_SK
    PRES_PRACTITIONER_DEA_NO
    PRES_PAYMENT_TYPE
    PRES_PHARMACY_EMAIL
    PRES_PERIOD_FROM
    PRES_PERIOD_TO
    PRES_STATUS
    PRES_DELETE
    PRES_DATE_CREATED
    PRES_USER_CREATED
    PRES_DATE_MODIFIED
    PRES_USER_MODIFIED
    PRES_IP_ADDRESS
    PRES_ALIAS_SK
    PRES_ALIAS_DATE_CREATED
    PRES_ALIAS_USER_CREATED
    Prct_Dea_No
    Prct_First_Name
    Prct_Last_Name
    Prct_City
    Prct_State
    Prct_Zip_Code
    PRCT_ADDRESS
    Disp_Dea_No
    Disp_First_Name
    Disp_Last_Name
    Disp_City
    Disp_State
    Disp_Zip_code
    DISP_ADDRESS
    Drug_Dea_Class
    Drug_Product_name
    Drug_Manufacturer
    FORM_QUANTITY
    STRENGTH

    The Fact is on
    PRES_LOAD_SK

    The dimensions are created based on
    dim1
    Prct_Dea_No
    Prct_First_Name
    Prct_Last_Name
    Prct_City
    Prct_State
    Prct_Zip_Code
    PRCT_ADDRESS

    dim2
    Disp_Dea_No
    Disp_First_Name
    Disp_Last_Name
    Disp_City
    Disp_State
    Disp_Zip_code
    DISP_ADDRESS

    dim3
    Drug_Dea_Class
    Drug_Product_name
    Drug_Manufacturer

    dim4
    PRES_ISSUE_DATE
    PRES_PERIOD_FROM
    PRES_PERIOD_TO
    FORM_QUANTITY
    STRENGTH
    PRES_RX_NO

    so based on these diminsion i am trying to fetch the details more over i am using Linked Server to fetch the details from this cube. this is the query i am using

    Select * FROM OpenRowSet('MSOLAP','DATASOURCE=Dev2k3-2; USER ID=;
    PASSWORD=;Initial Catalog=Maps_2006;','
    Select
    [PRES_PATIENT:Pres Patient First Name],
    [PRES_PATIENT:Pres Patient Last Name],
    [PRES_PATIENT:Pres Patient Address],
    [PRES_QTY:PRES ISSUE DATE],
    [PRES_QTY:Pres Date Filled],
    [DRUG_NDC_CODErug Product Name],
    [PRES_QTY:Form Quantity],
    [PRES_QTY:STRENGTH],
    [PRES_QTY:Pres Rx No],
    [PRCT_DEA_NO:Prct FIrst Name],
    [PRCT_DEA_NO:Prct Last Name],
    [PRCT_DEA_NO:Prct State],
    [PRCT_DEA_NO:Prct City],
    [PRCT_DEA_NO:Prct Zip Code],
    [DISP_DEA_NOisp Last Name],
    [DISP_DEA_NOisp Dea No],
    [DISP_DEA_NOisp ADDRESS],
    [DISP_DEA_NOisp Zip Code],
    [DRUG_NDC_CODERUG DEA CLASS]
    from
    [Prescriptions]
    WHERE

    [PRCT_DEA_NO:Prct Dea NO] = ''AW7444160''
    ')



    Please help me to create a cube.

    Rajesh Kumar
    Last edited by krajeshkumar; 04-19-2007 at 08:39 AM.

  2. #2
    Join Date
    Jun 2004
    Location
    Atlanta and Manhattan
    Posts
    607

    SSAS Should Handle It ... But Think This Out ...

    I find it difficult to believe that every row needs to be pulled into the cube. You don't indicate, either, if the large table is a fact or dim table (or whether you know the difference ...(.

    You can filter the induction queries, etc., to make it more manageable. You can also partition with any degree of control you need - even across multiple servers.

    Just pulling in all the rows would very likely be a bad design for many reasons, however. (This sounds like a transaction table ...) Let us know as you proceed and have more specific questions about your design ... If you give a little more detail about the business need, I'm sure we can offer some worthwhile options ...

    Good Luck.

    Bill

  3. #3
    Join Date
    Sep 2002
    Posts
    5,938
    Yes, you can build cube from big fact table. But you need big server for that.

  4. #4
    Join Date
    Apr 2007
    Posts
    5
    Hi,

    Thanks a lot... Here are some more details

    I am new to understand the cube.

    I am trying to create a dimension on fact table itself. So in my case i am refering to same table for both fact and dimension.

    This is my table
    PRES_SK
    PRES_DISP_SK
    PRES_DISPENSER_DEA_NO
    PRES_LOAD_SK
    PRES_RX_NO
    PRES_DEA_SUFFIX
    PRES_CUSTOMER_ID
    PRES_PATIENT_FIRST_NAME
    PRES_PATIENT_LAST_NAME
    PRES_PATIENT_DOB
    PRES_SEX_CODE
    PRES_PATIENT_STREET
    PRES_PATIENT_STATE
    PRES_PATIENT_ZIP
    PRES_RECORD_NUMBER
    PRES_DRUG_SK
    PRES_NDC_CODE
    PRES_DATE_FILLED
    PRES_NEW_REFILL_CODE
    PRES_QUANTITY
    PRES_DAYS_SUPPLY
    PRES_AUTHORIZED_REFILLS
    PRES_COMPOUND_CODE
    PRES_ISSUE_DATE
    PRES_PRCT_SK
    PRES_PRACTITIONER_DEA_NO
    PRES_PAYMENT_TYPE
    PRES_PHARMACY_EMAIL
    PRES_PERIOD_FROM
    PRES_PERIOD_TO
    PRES_STATUS
    PRES_DELETE
    PRES_DATE_CREATED
    PRES_USER_CREATED
    PRES_DATE_MODIFIED
    PRES_USER_MODIFIED
    PRES_IP_ADDRESS
    PRES_ALIAS_SK
    PRES_ALIAS_DATE_CREATED
    PRES_ALIAS_USER_CREATED
    Prct_Dea_No
    Prct_First_Name
    Prct_Last_Name
    Prct_City
    Prct_State
    Prct_Zip_Code
    PRCT_ADDRESS
    Disp_Dea_No
    Disp_First_Name
    Disp_Last_Name
    Disp_City
    Disp_State
    Disp_Zip_code
    DISP_ADDRESS
    Drug_Dea_Class
    Drug_Product_name
    Drug_Manufacturer
    FORM_QUANTITY
    STRENGTH

    The Fact is on
    PRES_LOAD_SK

    The dimensions are created based on
    dim1
    Prct_Dea_No
    Prct_First_Name
    Prct_Last_Name
    Prct_City
    Prct_State
    Prct_Zip_Code
    PRCT_ADDRESS

    dim2
    Disp_Dea_No
    Disp_First_Name
    Disp_Last_Name
    Disp_City
    Disp_State
    Disp_Zip_code
    DISP_ADDRESS

    dim3
    Drug_Dea_Class
    Drug_Product_name
    Drug_Manufacturer

    dim4
    PRES_ISSUE_DATE
    PRES_PERIOD_FROM
    PRES_PERIOD_TO
    FORM_QUANTITY
    STRENGTH
    PRES_RX_NO

    so based on these diminsion i am trying to fetch the details more over i am using Linked Server to fetch the details from this cube. this is the query i am using

    Select * FROM OpenRowSet('MSOLAP','DATASOURCE=Dev2k3-2; USER ID=;
    PASSWORD=;Initial Catalog=Maps_2006;','
    Select
    [PRES_PATIENT:Pres Patient First Name],
    [PRES_PATIENT:Pres Patient Last Name],
    [PRES_PATIENT:Pres Patient Address],
    [PRES_QTY:PRES ISSUE DATE],
    [PRES_QTY:Pres Date Filled],
    [DRUG_NDC_CODErug Product Name],
    [PRES_QTY:Form Quantity],
    [PRES_QTY:STRENGTH],
    [PRES_QTY:Pres Rx No],
    [PRCT_DEA_NO:Prct FIrst Name],
    [PRCT_DEA_NO:Prct Last Name],
    [PRCT_DEA_NO:Prct State],
    [PRCT_DEA_NO:Prct City],
    [PRCT_DEA_NO:Prct Zip Code],
    [DISP_DEA_NOisp Last Name],
    [DISP_DEA_NOisp Dea No],
    [DISP_DEA_NOisp ADDRESS],
    [DISP_DEA_NOisp Zip Code],
    [DRUG_NDC_CODERUG DEA CLASS]
    from
    [Prescriptions]
    WHERE

    [PRCT_DEA_NO:Prct Dea NO] = ''AW7444160''
    ')



    Please help me to create a cube.

    Rajesh Kumar

  5. #5
    Join Date
    Jun 2004
    Location
    Atlanta and Manhattan
    Posts
    607

    I Must Be Missing Something ...

    Hi:

    You ask a couple of places for help in building / designing a cube, and then throw a read query into the mix. Are you asking how to go about designing the cube, based upon the fact table describe, or how to "fetch the details" from an existing cube? (Whether querying directly or via a linked server, the cube has to be designed properly first before querying same, as I'm sure you can appreciate).

    Can we take it a single step at a time, so we can give you some useful guidance?

    Thanks!

    Bill

  6. #6
    Join Date
    Apr 2007
    Posts
    5

    way to increase Performance

    Thanks for your reply, As you have said we will take single step at a time.

    As i have show the dim1, dim2... are the dim that i have create. My problem is that if i try to fetch the details from the cube using linked server, for fetching details from the cube takes very very long time. So i need your help to redesign the cube or to figure out a way thru which the performance will be improved.

    Thanks in advance.

  7. #7
    Join Date
    Jun 2004
    Location
    Atlanta and Manhattan
    Posts
    607

    Step Number One is to Remove Variables - and Get More Details

    Ah, now we get to the real question. "Creating a cube" has evolved to optimizing a query - and I assume asking for guidance in modifying cube structure to optimize run time for that specific query; this kind of leads into part of my response ...

    Query optimizaton - be it a direct or linked-server query - is difficult to do without some detail about the cube. What we appear to have here are dimension names, a descripton of a very large fact table, and little else. No idea of hierarchical / attribute structures, no precisely identified measures (I can guess from the list you give, but that might be iffy), etc. We need to focus on aggregation and other settings in the cube to make sure that the cube itself is optimal, before adding the linked-server variable to remove complicating factors.

    I would suggest beginning with a direct query on a local machine to eliminate variables. (I have suggestions on how to optimize the query from the "direct" scenario, once you get there ...). Once the cube is optimal there, you would deploy that design to the linked server and begin again with the query as you originally frame it - if you get the query optimal in a direct situation, you can then begin working with it as close to optimal as possible, in the linked server setting, so as to free you to focus on the connectivity issues, etc. - if indeed they still exist (they may not, once the query has been optimized in its direct context) - and if, indeed, the linked-server scenario was every truly necessary from the start ...

    If the response is "I have no access / capability of working with a direct query against the cube," and / or "I have no access to the cube (on another server), and have to do all my optimization 'remotely,'" then we have a pretty fundamental obstacle. This is a terrible scenario from which to try to optimize what might be a bad cube design - and especially within the context of a single query that you deem "too slow." You'll never be able to know if the query is "taking too long" or not, as you will have no frame of reference for an optimally performing query.

    (Another thing that strikes me is that result dataset of this query would make a report that is challenging to use in a productive way ... ia filtering out of the question (at least in the design stages) - filters that might be parameterized in the end report to make it more consumer friendly anyway?)

    Does that make sense? Can you get a handle on optimization with a direct query first? An approach might be to go with a representative subset of the data in a local server scenario, if the data size is the reason you will say you cannot work with the cube locally ...

    We'll work through it, provided you have some control over all the pieces ... let me know if you can set up a local "lab" environment where you can eliminate the complications first, as I have tried to describe.

    Thanks.

    Bill

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •