Disambiguation
Posted March 15th, 2006 by Clint
Edit: I love free advice, even if it points out my noobiness. Please see Robbin’s comments at the end of this post on my faux pax. Thanks to Robbin, I have gone back into the post and added some links to external sources where appropriate.
One of the goals of visual report design is to disambiguate, or make clear, the differences between things (data). When working with charts, there are some choices that you have to make that will either help or hinder your effort to make things clear.
- Chart Type
- Multiple or Single Plots
- Color Choice
- Inclusion of Ancillary Information
My basic rule of thumb is this, does every element contribute to overall understanding proportionally? Note that I didn’t say equally, some elements (like the data itself) are more important and therefore take on a greater responsibility than others.
By way of example, we’ll be looking at some charts I created using basic traffic data from Instant Cognition from March 7, 2006 to March 15, 2006 (thanks to StatCounter for free basic stats).
OOTB (Out-of-the-Box or default) Chart from Excel
So let’s walk through the choices…
1. Chart Type
A common mistake is picking the wrong chart type, in this case I selected a column chart to display time-series data - there are usually better options such as line or scatter plots.
2. Multiple or Single Plots
Another common mistake is to group data into multiple plots on a single chart on the assumption that they are similar. In thise case I am plotting all of my basic traffic stats on this one chart. One problem that arises here is the relative scale of the different plots; the page load scale is so much larger than the others that it makes it harder to distinguish trends for anything but page loads. Additionally, any time a chart has multiple plots, the amount of work that you have to do to make the information in the chart meaningful and understandable escalates.
3. Color Choice
Report Design pundits like Tufte often warn against using color (but will soften the warning by saying that it’s because if you need to print or photocopy the chart, the color - and the information that it provides - will be lost). Really what they mean is that using color well in charts, as in any design, is difficult and if you don’t apply it very carefully the color will at best distract from the information in the chart and at worst, change it in unanticipated ways. We can see that by default Excel makes poor color choices. Sure the columns are visually distinct from each other by way of color. But the plot background actually becomes the single most important element in the chart by the sheer volume of gray that it displays. Next in importance visually is the legend, followed by the horizontal rules and then the chart title while the plots themselves come in nearly last in terms of their visual importance - shouldn’t they be first in importance?
4. Inclusion of Ancillary Information
By default, Excel has included the Legend, the Chart Title, X-Axis values, Y-Axis values, horizontal rules and plot area as ancillary pieces of information - most with disproportionately large visual importance. The net effect of this OOTB experience is that the chart does not communicate well. The data is not visually important and it is difficult to interpret what the data might be saying. BTW, what are the values for each of the columns? We can estimate their values at least for the larger ones like page loads but we don’t know for sure.
A Better Solution…
1. Chart Type
Rather than using the column chart I have switched to a line chart which makes it easier to see the trends (relationships) between the data points.
2. Multiple or Single Plots
As described earlier, multiple plots on a single chart makes my job harder and being lazy I seperated the four plots into four different charts - giving each its own spot light so to speak. Note that I have resized the individual charts so that they take up about the same amount of space as the original chart.
3. Color Choice
First thing to go - plot background color - it was hurting way more than it helped.
Next I set the plot line to a solid navy blue which gives it some added importance and is clearly visible. Almost any color will work in this particular scenario (although you might give some thought to what a particular color indicates. Navy Blue is a nice cool color with neutral connotations as opposed to say Red which says ‘Danger, Danger Report User’) but be consistent - using a different color for each line doesn’t provide additional information and will slow down a user’s comprehension of the charts - they have to think about what each different color might mean.
Then I changed the horizontal rules to a medium gray and changed their thickness to the finest available - they are now more of a hint or suggestion - our brains will do the rest.
4. Inclusion of Ancillary Information
Chart titles are important - we need to keep those
Legend - I almost never use the legend because it’s very difficult to position and order in such a way that it relates directly to the plot(s) in the chart and since I am only using one plot per chart the legend is unnecessary - it doesn’t add value.
Vertical Scale - I scaled each chart appropriately for the data and then to maintain some visual continuity (which eases understanding) I set the vertical scale to divide the height into four segments - it’s the same on each chart which means that we don’t have to re-understand the scale for each one.
Data table - I’ve included the data table at the bottom of each chart so that we can quickly see the value of any point in the plot. Another couple of benefits to using the data table are that I could get rid of the X-Axis labels (they are the same in the table) and the table names the series for me. By the way, I’d like to apologize now to Tufte et. al. for using such small sets in my charts - I know it violates one of the cardinal rules; which is to use chart and graphs for large data sets only.
One of the things that you might notice right away in this four-chart solution is that three of them (page loads, unique visitors and new visitors) all follow a very similar trend while returning visitors have a very different trend which is something that is nearly invisible in the OOTB chart.
One final benefit to the solution above. Although I didn’t put it in, the four chart solution leaves me space for annotation on the plots. So for instance, I might have annotated March 9th on the page loads chart as the day that Eric T. Peterson posted a response on his WebAnalyticsDemystified blog about my Dashboard post. But we’re not interpreting my traffic in this post, just using it as example data for the charts. ![]()






March 16th, 2006 at 13:15:00
Hi Clint.
As you already know, I love to tell other bloggers how to blog so that their posts meet my needs. (Selfish, selfish.)
It would be great if you would hyperlink more among your own posts. For example, when you wrote about Eric’s comment on a prior post, I wanted to read the post. But it wasn’t hyperlinked in the feed so I had to go to the site and then guess at which post you were referring to. This will also increase your page views and time on site, too bad there’s no conversion element here…
Let’s compare blog ideas in Santa Barbara.
Robbin Steif
LunaMetrics
My web analytics and conversion blog
March 17th, 2006 at 17:46:00
Very nice post Clint. I always love the use of the word disambiguation. It reminds me of a time I had as my title, “Director of Disambiguation”. That sure was a conversation starter with the clients.