I recently came across this question on the Tableau forums. Serendipitously, I had just started doing this and had worked out the rather simple solution. Because someone other than me had raised the question, I thought it might be worth sharing here.
The Problem: Splitting a measure into discrete, sub-measures based on the values of a dimension.
The Why: Lets say you have a measure ‘Sales’ and a dimension ‘Product Type’ (this should sound familiar as its from the ‘Coffee Chain’ sample DB). Further assume that you want to do some parts-to-whole comparison which could be achieved via a table calculation but there are a couple of practical issues with that. One, the table calculation is basically a virtual object and so you can’t use it in downstream calculations and two, if you only want to display a specific dimension-value — ‘Coffee’ for instance – I can’t filter for ‘Coffee’ and still show its part-to-whole relationship correctly. By splitting the measure into discrete sub-measures we can perform operations and calculations on the data that otherwise might not be possible.
Here’s what the post on the Tableau Forum said by way of further illustration:
[H]ave a look at the attached xls file. You see that I have a column called "Indicator". I now would like to make a scatter plot with KPI_1 on the x-axis, KPI_2 on the y-axis, and using KPI_3 to size the circles.
I thus somehow need to split my dimension "Indicator" into 3 specific components for each of the KPI. Can this be done in Tableau ? Or is it mandatory to have basically one column in your data source, per dimension?
The solution is fairly simple although it may not be particularly intuitive. We’ll leverage the LOGICAL function ‘IIF’ which tests whether a condition is TRUE or FALSE and returns a specific response depending on the condition. So, using the Coffee Chain sample Access DB…
- Right-mouse click on ‘Product Type’ in the Dimensions shelf
- Select ‘Create Calculated Field…’
- Name the Calculated Field “Coffee Sales”
- In the formula pane (which should be pre-populated with [Product Type], add the following formula: IIF([Product Type = “Coffee”,[Sales],NULL)
- Click ‘OK’
You should now have a calculated measure in the Measures shelf called “Coffee Sales”. Repeat the above process for each of the other three values in the ‘Product Type’ dimension (Espresso, Herbal Tea and Tea).
Use ‘Duplicate Measure’ and ‘Edit…’ to speed up the process. All you will need to do is change the targeted Dimension value
Now that we’ve created four measures based on splitting values in the dimension of Product Type, we can create an example Scatter Plot to address the question from above.
On a blank worksheet, drag ‘Coffee Sales’ from the Measures shelf to the column shelf – make sure it is set to ‘Continuous’. Drag ‘Espresso Sales’ to the row shelf and make sure it’s set to ‘Continuous’. You should now have a scatter plot with one mark. Probably, we want more detail than a single mark – try right-mouse clicking the date dimension and dragging it to the ‘Level of Detail’ shelf and select ‘Date (Continuous)’. This ought to give you 24 marks. They probably appear to be tightly grouped with little discernable pattern wise, but let’s work with it for a moment.
Try putting ‘Profit’ on the size shelf. Now we get an indication of where the high profit versus low profit intersects are but the data is still highly clustered making pattern recognition difficult.
See? Still a jumbled mess, we’re not learning anything yet but this is exploratory analysis so it’s ok! Let’s try putting ‘Market Size’ onto the ‘Shape’ shelf.
Ok, so Profit isn’t really telling us anything that we couldn’t have logically concluded from ‘Major’ to ‘Small’ market differences. Let’s create a % Net Margin measure. In this case, be sure to use SUM otherwise the % Margin calculation may have unintended values. The formula should look like this: SUM([Profit]/SUM([Sales]).
Still not a lot to see here. What if we drag ‘State’ onto the Color shelf?
Now we’re getting somewhere! Adding State-level detail gives us a lot more marks to work with, but the overlapping points are still making it harder to read than is necessary. Let’s make some formatting changes to see if we can make things a little more clear. Start by increasing the transparency of the mark colors – I ended up at about 50% transparent. That definitely helps to see ‘through’ the overlapping points as well as the density of marks. Make sure that the scales on the x and y axes are the same – it’s a scatter plot afterall. I also changed the pane background to black (yes, black) and added light gridlines. In this case, I think both help make discernment easier.
Enable highlighting from the Color legend, then you can focus on a particular state in the visualization. Most states exhibit what most of us would be ‘normal’ patterns – either no particular relationship between Coffee and Espresso or as Sales in one grows, so does the other. A couple of outliers though are apparent though. Try selecting Ohio. In the ‘Small’ markets product sales are binary – that is, either Espresso is sold or Coffee is sold but not both. On the other hand, in the Ohio ‘Large’ markets, we revert to one of the ‘normal’ patterns. Now select ‘Massachusetts’. What? There appears to be a downward trend in Espresso sales as Coffee sales increase. This would seem to indicate a more graduated preference pressure on Product Type than we saw in Ohio. Further analysis of Massachusetts would be necessary to understand the graduated sales and if that pressure can be used to improve sales or profits.
Alright, so that’s it. A tutorial on how to create dimension-value specific measures that turned somewhat into a discussion of exploring data to find interesting patterns. All that’s left is the discussion…
FILE ATTACHMENT: Split Dimension Tutorial Packaged Workbook