Pertaining to my previous post, I need to distribute some external promotional cost at the level of my order data.

Let's lay out the scenario:
  • The list of websites we monitor is classified by country and "segments", for instance, based on the core services they provide.
  • As the mother company behind all these websites, "we" promote, on a monthly basis, along multiple of those "segments" and in specific countries.
  • At the end of the month we summarize the bills and get a table that tells us: for this segment and country, we spent X amount of money this month.


The objective...
  • First for performance tracking we want to be able to spread this cost against the orders we got in. But obviously doing it properly, that is matching cost per segment/country/month to orders per segment/country/month and then bringing this down to the website and daily level, because monthly analysis isn't granular enough. So if we spent $30,000 on segment A, that has 2 websites, for a month of 30 days and if say site 1 had 1,000 orders and site 2 had 500 then the incremental cost per order is $20, site 1 incurred n additional cost of $20,000 and site 2 $10,000, that, having this $20 factor I can now report at the daily level, for each site.
  • Getting the results for the past month is all good. Now in addition, I need to be able to project this external marketing cost for the current month. As I have now come up with the estimated cost per order for segment A to be $20, I want to assign this cost for each order that occurs in the current month for any website in this segment, so I can project based on the number of orders how much the external promotional cost will be.


So, there are multiple ways to do this…

One of them is to simply add a "cost" column of the relational order table and aggregate this field. The issue I have with this solution is that I end up writing estimate information against real data. Besides, at the end of the month, when I get the real numbers for cost, I have to rewrite everything! It's okay for a few thousand rows, not for half a million row. Besides this would ruin the incremental update system I have setup - by changing data on so many orders I'd be better off refreshing the whole cube than doing the incremental process!

I did find a solution, outlined by the cell calculation of the previous post; however I’m not satisfied with it because it forces me to maintain an aggregation table. Let me explain with a simpler context:

Orders are “dimensionalized” against three dimensions: website, time (daily) and scenario. Off of the website dimension, I have two properties: segment and country, which spawn 2 virtual dimensions: segment and country.

External cost are provided “dimensionalized” against: segment, country and time (monthly level only).

The ideal solution would be to be able, at aggregation time to say “for this website, today, this is the cost factor to apply per order, because, this is the current month, and this website belongs to this segment, in this country. For this segment, country and month our cost was X and globally we had Y orders this month – there go cost = number of orders today*X/Y”
I couldn’t figure out how to do this because it involved LookupCube at some point or another and as I mentioned in my previous post I ran into issues with it in Excel.

So I went another route. While it works, it is fairly ugly as I consider it: I am forced to deploy the cost at the website and daily level and maintain this into a separate table off of which I build a cube. Through a virtual cube and a cell calculation of the firm: [Measures].[Order Count]* CalculationPassValue((…untied dimensions…, [Measures].[External Cost]),0)/ CalculationPassValue((…untied dimensions…, [Measures].[Order Count]),0) (formula from the previous post), I get the result I want.

However, again, this is quite inconvenient: I have to maintain that table and it becomes part of the refresh process… When I truly have 500 cost values provided to me, I end up with a table of 300,000 records to perform this distribution. In the greater scheme of things it doesn’t much matter, the refresh process takes barely a minute and still allows me to maintain the systems; SLA I imposed myself of having live data updated every 30 minutes, however, I fully suspect this to become on more step toward eventually not being able to maintain this SLA as the system grows more complex and other cubes are added.

As I typed this, I came up with a possible idea to work around this issue that I need to experiment with, but given I’m quite experimenting by a lot of error through trial I’m not sure whether this will work… The idea is to force the calculation of the cell at the lowest level (using the properties of the website dimension at this level), so I can get a cost dynamically calculated for website and day regardless of the query selection, then these values summed up through aggregation to get the proper result.
Is this the way to do it? Any other suggestion, maybe?

Thanks,


Seb