Sunday, December 12, 2010

Dynamic Cube Partitioning in SSAS 2008

In this article I am going to explain the Implementaion of Dynamic Cube Partition in SSAS 2008. I would want to split this article in two parts as mentioned below:
  1. Partition benefits
  2. Implementing Dynamic Partitions
Partition Benefits:
  • Better Query Performance: Cube partition is a powerful mechanism for improving query performance. Queries that summarize data over 10 years could take considerably longer than those that only search through the current year data. If we have proper partitions then SSAS only has to scan a small subset of data to return query results hence dramatic performance improvements compared to queries running against a cube with a single partition.
  • Minimize downtime:  Cube partitioning supports reducing downtime associated with cube processing. In almost all the cases, a portion of data warehouse is volatile and needs to be processed often. However other portions are relatively static. For example, in a sales cube, we need to change the current year's data nightly, but sales from previous years might change only occasionally - in case if account for merchandise returns and exchanges. If your warehouse tracks last 10 years salesthen processing only the current partition may be 10 times quicker than processing the entire cube.
  • Aggregations benefits: The partition queried frequently could benefit from additional aggregations, which in turn could improve performance. Partition(s) that are used less can be processed less frequently with considerably fewer aggregations.
  • Customized storage and processing settings: Frequently accessed partitions might benefit from proactive caching and ROLAP storage. On the other hand, other forms of storage and processing might be better for less frequently queried partitions.
  • Distributed query and processing load: SSAS allows you to create remote partitions - a remote partition resides on a server different from its parent cube. This way the queries that affect the remote partition are processed on a server separate from its parent cube, allowing you to take advantage of additional processing power.
  • Parallel Partitions Processing: SSAS allows processing multiple partitions in parallel on a server that has multiple processors. This can further reduce the total cube processing time.

 Implementing Dynamic Partitions

In this section I will explain step by step operations to create Dynamic Cube Partitions.

STEP1:
Preapare test data for Sales cube. I will use three dimensions (DimDate, DimProduct, and DimCustomer) and one fact table for Sales cube. Click here to download CubePartitionData.sql file which contains T-SQL code to generate these tables along with sample data.

STEP2:
Create new Analysis Services Project using BIDS and save it as Sales.

STEP3:
Create Data Source, and then Data Souce View using above three dimension tables.

If you are newbie, click Creating First Cube in SSAS 2008 for more information.

STEP4:
Create Sales cube  along with required dimensions. Set all  required properties of dimensions and create any hierarchies if required. My Solution looks like this:

STEP5:
Double click on the cube and navigate to Partition tab. You will see a default partition as shown below:

Delete this default partition and Click on new partition... to click new partition. In Partition Wizard, Select FactSales as available table in Specify Source Information page and click next. Select Specify a query to restict row in Restrict Rows page and write WHERE condition to restrict partition rows.

Click Next twice and enter Partition name Sales_20100101-20100131 in the Completing the Wizard page and click Finish.
Finally Process the cube. Once cube is processed successfully, you can see a new database in Analysis Services:

You can browse the data for this partition:


STEP6:
We are done with one manual partition. Now its time to create the partitions dynamically and apply a logic to process the delta data every time.

First off all, right click on the Partition Sales_20100101-20100131 (highlighted above)  and then select Script Partition as ==> Create To ==> New Query Editor Window. It will create XMLA scripts for this partition. Save this file in the project with CreatePartitionSales_20100101-20100131.xmla name.

In similar fashion, right click on the Partition Sales_20100101-20100131 and then select Process. This will open Process Partition wizard, click Script to generate XMLA script and save this file in the project with CreatePartitionSales_20100101-20100131.xmla name.

STEP7:
We are done with SSAS development! Now we have to create an SSIS package to implement dynamic creation of cube partition and processing of that partition.

Click on File => Add => New Project in menu bar in the same solution. Select Integration Services Project and enter the name of the SSIS Project.
Rename the SSIS package1.dtsx by DynamicSalesPartition.dtsx.

STEP8:
Add two connection Managers - one OLE DB connection manager for database and another Analysis Services connection manager for SSAS database.
STEP9:
Add following Package variables:

Here is the brief description of each variable:
Directory - To store package path
FromDateKey - To Store current partition start date key e.g. 20101201.
ToDateKey - To Store current partition end date e.g. 20101231
PartitionName - Current partition name e.g. SalesPartition_20101201-20101231
IsPartitionExists - To check whether partition alreday exists or not
CreatePartitionXMLA - To store XMLA script to create current partition
ProcessPartitionXMLA- To store XMLA script to process current partition
Note: You can add more variables as per need and business requirement.

STEP10
Now you have to develop SSIS package as shown below:


First Execute SQL Task is used to initialize FromDateKey, ToDateKey, and PartitionName for the current partition. Use below query in this Task:
DECLARE
   @FromDateKey varchar(8), @ToDateKey varchar(8),
   @CalendarYear int, @CalendarMonth int


SELECT
   @FromDateKey = MAX(FromDateKey),
   @ToDateKey = MAX(ToDateKey)
FROM PartitionLog (NOLOCK)


SELECT
   @CalendarYear = MIN(CalendarYear),
   @CalendarMonth = MIN(CalendarMonth)
FROM DimDate WHERE DateKey > @ToDateKey + 1


--SET FromDateKey and ToDateKey for current Partition
IF NOT CONVERT(varchar(8),GETDATE(),112) BETWEEN @FromDateKey AND @ToDateKey
SELECT @FromDateKey = MIN(DateKey), @ToDateKey = MAX(DateKey)
FROM DimDate (NOLOCK)
WHERE CalendarYear = @CalendarYear AND CalendarMonth = @CalendarMonth


SELECT @FromDateKey AS FromDateKey, @ToDateKey AS ToDateKey,
'Sales_' + @FromDateKey + '-' + @ToDateKey AS PartitionName
GO
Second Task in the package is Script Task. Here we need to read the contents of files
CreatePartitionSales_20100101-20100131.xmla and
ProcessPartitionSales_20100101-20100131.xmla and store in the package variables CreatePartitionXMLA & ProcessPartitionXMLA.

I am using User::Directory as ReadOnlyVariable and User::CreatePartitionXMLA,User::ProcessPartitionXMLA as ReadWriteVariables.


Click here to download the code for Script Task - Read XMLA Files and store in variables.

To know about reading file contents using Script Task, click Script Task to Read File.

Third Task is a sequence container which contains three task - Script Task, Analysis Services Execute DDL Task, and Execute SQL Task.

Script Task is used for two reasons -
1. To check whether current partition exists or not.
2. To update XMLA content for current Partition.

This is the most tricky and interseting part here. We need to replace Partition ID, Partition Name, and Source QueryDefinition for current Partition which are highlighted below:

In Script Task, select User::FromDateKey,User::PartitionName,User::ToDateKey as ReadOnlyVariables and User::CreatePartitionXMLA,User::ProcessPartitionXMLA,User::IsPartitionExists as ReadWriteVariables.

Click here to download the code for Script Task - Generate XMLA to create and Process new Partition.

Click SSIS Script Task to Find and Replace File contents to know more about find and replace mechanism using script task.

Second task in the sequence Container is Analysis Services Execute DDL Task. This task is used to create new Partition. In Analysis Services Execute DDL Task Editor, select SSAS.localhost as Connection, Variable as SourceType, and User::CreatePartitionXMLA as Source.

Also don't forget to set precedence constraint. You should execute this task only when partition does not exists. Open Precedence Constraint Editor and Select Expression and Constraint in Evolution operation, Success in Value, and @IsPartitionExists == 0 in Expression as shown below:

Third Task in the Sequence Container is Execute SQL Task - this is used to store newly created Partition information in the PartitionLog table. In Execute SQL Task, select ResultSet None, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:

INSERT INTO dbo.PartitionLog
([PartitionName],[FromDateKey],[ToDateKey],[CreatedDate])
VALUES (?,?,?,GETDATE())

Click on Parameter Mapping and map the parameters as shown below:

STEP11
Finally we need to process the current partition. This is independent of previous sequence container where we are creating new partition if it is not exists.
However, we need to process the current partition every time we execute the package. In most of the cases, new partition will be created on the first day of month but partition should be processed everyday to include delta data.

This sequence container also contains three tasks.

First task is Analysis Services Processing Task - to process the dimensions data. Select SSAS.localhost as connection manager and all the dimensions in Object Name - which are not static in data. Select Process Update as Process Option.

Second Task is againg Analysis Serives Processing Task - to process Current Partition. Select SSAS.localhost as connection manager and Sales_20100101-20100131 in Object Name - which is default partition (or first partition). Select Process Full as Process Option.

Now click on Expression and select @[User::ProcessPartitionXMLA] in ProcessingCommands.

Finally store the Last Processed Date and Processed Count (number of time this partition has processed) in the log table using Execute SQL task. Select ResultSet None, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:

UPDATE PartitionLog
SET LastProcessDate = GETDATE()
,ProcessedCount = ProcessedCount + 1
WHERE PartitionName = ?
 
Click on Parameter Mapping and map the parameters and map the parameter 0 with User::PartitionName.


STEP12
We are done with package development. Now its time to execute the package and check the dynamic behaviour of Partition creation and processing. Before execution the package, there is only partition in the SSAS Sale database.
Now Right Click on the package and execute it. Once package is execute successfully, you will see one more partition in the Sales database and one more entry in PartitionLog table.


 
Now Execute the package or schedule it to execute automatically. Once you are done with 10 more executions, you will reach to current month partition Sales_20101201-20101231. and you can see 12 partitions - one partition for each month as shown below:


Since you are in current month partition, if you execute the package every day in current month - it will not create any new partition till beginning of next month. It will only process Current Partition to include the data in the cube as shown below:



Now you can browse the data for all the partitions:

We are done with Dynamic Package creation.
I hope you will like this article and if your answer is yes then don't forget to click Like :-)


Many people have had asked me to share the solution. I have stored that in my SkyDrive so you can download it from here: Dynamic Cube Partition

Cheers!!!

 

35 comments:

  1. well i have tried all the process upto SSAS cube building and i am new to SSIS so could not complete the package design as above.
    so could you send me the package that you build to describe this blog ..if you have it please can you send it for me ?..that will be helpfull :)

    Thanks
    Anil Maharjan

    ReplyDelete
  2. Hey, This is a fantastic example. However, your sample code files are not available for download - i can probably extrapolate and figure out the content, but it would be nice to just have it as presented. Do you think you could fix the links, please?

    Thank you,

    Fahim.

    ReplyDelete
  3. @Anil - Share your email, I'll send you the pkg

    ReplyDelete
  4. @Fahim - I guess everyone should be able to download sample code files as those are public. I tried with different Gmail IDs and it worked fine. Does anyone else face the same issue?

    ReplyDelete
  5. Thanks for response Hari,
    Welll you can send me the package in this email add as anil_pas@hotmail.com or mepasanil@gmail.com

    Hope you will send the package soon..
    Thanks
    Anil Maharjan

    ReplyDelete
  6. Thanks Hari for sending me package but i got some errors with the package as :

    Error at Analysis Services Processing Task [Analysis Services Processing Task]: Processing commands contain DDL that is not valid.
    Error at Analysis Services Processing Task: There were errors during task validation.

    (Microsoft.DataTransformationServices.VsIntegration)

    i coludnot figureout what is going on behind.. :(

    ReplyDelete
  7. ohh i got a solution now,and finally solved the error too hope you will help me out further too.
    well is there any blog\link from where we can easily learn the SSIS from basic.?
    thanks again
    Anil Maharjan

    ReplyDelete
  8. hello hari,
    after successful execution of package ,the partition are created but while i browse the cube it dose not show the required result\output as yours.
    i figure out that for all 12 partition the QueryDefinition are same as: "SELECT [dbo].[FactSales].[DateKey],[dbo].[FactSales].[ProductKey],[dbo].[FactSales].[CustomerKey],[dbo].[FactSales].[SalesAmount]
    FROM [dbo].[FactSales]
    WHERE [DateKey] between 20100101 and 20100131"

    why it is not update i cannot figure it out.
    so please help me out.
    Thanks
    Anil Maharjan

    ReplyDelete
  9. Hello hari,
    I got the solution finally.There is some silly mistake i have done as C# renders the condition "[DateKey] between 20100101 and 20100131" as case sensitive so finally i fix it an got a solution.
    Thanks agian,
    Anil Maharjan

    ReplyDelete
  10. hi hari can u please send me the package to karthik.ramesh@sryas.com??

    I followed the steps... its gives me error in the script task..propably the BIDS which i am currently using dont support c# code.. I am working in 2005...is that fine????2008 is not allowed fro me but i must do the dynamic partioning..

    Please help

    ReplyDelete
  11. Amazing! The same is conceptually explained in Expert Cube Development with SSAS 2008 book. The authors also present a solution with that book. But I found the one presented here, to be the simplest. Thanks.

    ReplyDelete
  12. Hi Hari

    i followed your guidelines step by step using my own cube. In the end I cannot see the partition under cube in solution explorer when i process the cube the newly created partition has been processing which is fine, but in order to generate the script i cannot see the partition in the solution explorer...

    any help

    ReplyDelete
  13. I have found the way to get xmla script :)

    there is one more problem in the cube processing task when i give variable for processing the current partition it says

    TITLE: Package Validation Error
    ------------------------------

    Package Validation Error

    ------------------------------
    ADDITIONAL INFORMATION:

    Error at Analysis Services Processing Task 1 [Analysis Services Processing Task]: Processing commands contain DDL that is not valid.
    Error at Analysis Services Processing Task 1: There were errors during task validation.

    (Microsoft.DataTransformationServices.VsIntegration)

    do you have any suggestion on this.

    regards

    harris

    ReplyDelete
  14. @Harris,
    For your first Question - I have mentioned how to generate XMLA script in STEP5.

    For your second question - Please go throgh Step 11 once. To avoid any validation errors, you need to select the default partition to be processed (which is already processed by you). At run time, variable will take of running partition.

    Does this make sense?

    ReplyDelete
  15. Hi Hari

    For the first question its no more problem now.


    For the second one your answer does make sense. but I went through the full step 11 i have selected the available partition the default one for processing. then in expression i have selected the Partition Processing Variable. when i run the package now it still gives me the same error. when i open Analysis Services processing task now I cannot see the default partition in object list. there is nothing every thing is removed.

    Can you please comment on this?

    ReplyDelete
  16. Hari
    I am using a variable @ProcessXMLA to process a partition. The problem now is if I put this XMLA code in Analysis Services Processing Task's Processing Command property without setting any expression the task works fine but if I use the same as a variable in expression it gives me the error which I mentioned in earlier post.

    Thanks for your help.

    Harris

    ReplyDelete
  17. Hari

    The variable for processing the fact is working now so please ignore the above posts. There was apparently error in initializing the variable.

    Now I have some thing else for you :)

    When i run the package it creates new partition and process the dimensions and the freshly created partition. The problem is if I am processing Partition for December it deletes all the old records from the cube and only December is available for viewing. I need to keep the old records in cube and process the new partition to add new records.

    Any thoughts?

    Cheers

    Harris

    ReplyDelete
  18. Harris,
    If you look at ProcessPartitionSales_20100101-20100131.xmla file, it will always process only this partition because ProcessOption is used as ProcessFull. At run time variable ProcessPartitionXMLA will convert the static partition into running partition.

    However, In your Dimension Processing Task, make sure the ProcessOption for dimensions are ProcessUpdate.

    ReplyDelete
  19. Hi Dear
    Thank, it's excellent example.
    Can you send me your sample code?
    My Email Address is:hamedpalik@gmail.com

    ReplyDelete
  20. Could you pleease send me project too; mtnrkl@gmail.com

    ReplyDelete
  21. Could you pleease send me project too? hongseboy@gmail.com

    ReplyDelete
  22. Hi,
    Thanks for the detailed explanation.
    I applied your instructions step by step on my cube, and managed to create partitions up until the current month (Oct 2012).
    Problem is, once it got to the current partition, it won't skip the Execute DDL Task, and it fails the package, saying this partition already exists.
    I placed a PostExecute Breakpoint on the second Script Task, and realized that the variable IsPartitionExists keeps getting the value 0, even though the partition already exists and as per the script it should get value 1, hence skip the DDL Task (that's what the precedence constraint is for).
    Any idea of why does the script keeps placing value 0 to the variable?

    Thanks,
    Muli

    ReplyDelete
    Replies
    1. Never mind, I solved the problem.
      Apparently for some reason the script requires that any object ID will be identical to that object's name (even though the script dosen't even have parameters object IDs). In my case I had a MeasureGroup ID different than the MG name, so it didn't recognize the partition as existing.
      Only after re-creating the MG with same ID as the name, it passed the value 1 to IsPartitionExists variable.

      Delete
  23. Can you send me the project also? rkolofer21@gmail.com

    Thanks a lot!

    ReplyDelete
  24. I'm having difficulties with the Step 11. In Analysis Services Processing task I add the 'Sales_20100101-20100131' partition as an object. I add an expression with a Property of 'Processing Commands' and drag and drop the variable @[User::ProcessPartitionXMLA]into the Expression. When I save the package and edit this task the object has disappeared. If I try and run the package without saving I have a Package validation error - 'Processing commands contain DDL that is not valid'. I manually ran the modified script in SSMS and it worked fine.

    ReplyDelete
    Replies
    1. I had to research the same problem. You have to set the "Delay Validation" property of the task to "True." That solved the problem for me.

      http://mohitnayyar.blogspot.com/2007/10/selective-cube-measure-groups.html

      Delete
  25. Great article! I can't wait to see the project.
    Could please forward the project to gasblr@gmail.com

    Thank you very much.

    ReplyDelete
  26. Hi Guys ,
    I saw some of you got the package , I asked
    Hari 100 times to provide it to me without any result ,
    can somebody send the package to vali_loghin@yahoo.com

    Thank you in advance !!!
    Val

    ReplyDelete
  27. I am getting many request offline to send the solution. To make it easier, I have added a link at the end of the article to download the solution!

    ReplyDelete
  28. Hey Hari,
    First of all thanks for this amazing post.

    I have run into a problem which I cant seem to get a handle on, so I am bothering you. In the second script task I see an error "The binary code for the script is not found". I built the script code alone and found that it could not find Microsoft.Analysis Services.dll. Are you using SQL Server 2005 for the solution because this comes when we migrate it to SQL Server 2008. I am using 2008 so I am not sure what else can cause this issue.
    Do you have any idea how to fix that?

    ReplyDelete
  29. Thanks for the great post. Helped me out a ton.
    What if we want to retrieve the partition size for each partition created and processed. Then output to a the partitionLog table?

    ReplyDelete
  30. This comment has been removed by the author.

    ReplyDelete
  31. An excellent guide and a very well structured process, exactly what I've been looking for. Thanks so much!

    ReplyDelete

Note: Only a member of this blog may post a comment.