Home | About | Log in | Get The Feed
Spot the Chart Lie – Alley Insider Edition

December

31

2009

Yesterday, the Alley Insider published the chart at left as their “Chart of the Day”. The chart shows trends in the number of people employed by the Newspaper industry from 1947 – September 2009 as estimated by the Bureau of Labor Statistics. At first blush it seems ok, but the title (The End of Newspapers) begs for closer inspection.

The issue here, if you haven’t already spotted it, is that the y-axis starts at a non-zero value (200K). While it is acceptable, under certain conditions, to use a non-zero y-axis this isn’t one of them.

Using a non-zero y-axis on a trendline is typically used to expose patterns in the trend that otherwise would not be visible and the absolute value of any given point on the line is not that important. This trendline has a pronounced pattern so a truncated axis does not expose any new information. If you don’t intentionally look at the values on the y-axis then you might assume that where the trendline starts on the left is zero so the employment numbers for newspaper publishers are quickly approaching bottom. That is simply not true, so the chart is being used to lie to you.

Now, I’m not actually arguing that the newspaper business isn’t in trouble. Circulation is down, ad revenue is down, yada yada yada. Looking at the graph, we can see that the most recent months have the same employment levels as those of the early 1950s. But again, without some context, the number of people employed in the industry probably isn’t the best indicator of sector health.

So here’s how I’d look at this single measure of the Newspaper industry:

Powered by Tableau

The two sets of graphs show monthly and annual employment estimates and for the Newspaper industry from 1947 to 2009. The first chart in the monthly or yearly block simply trends the number of employees from period to period across the entire set. The variable width of the line is also the number of employees and is used to re-enforce the relative weight of one period compared to another. The yellow band on the employment trend charts covers the minimum number of employees to the maximum – basically the vertical height covered by the Alley Insider chart. You can see that there is quite a bit of white space underneath the yellow band and thus the chart lie. Changing the start value of the y-axis doesn’t expose any new insight about the trend, exaggerates the line slope and makes it seem as though the employment numbers are bottoming out – which they are not.

The second chart in each block shows the percent (%) change period-over-period (M/M or Y/Y) in number of employees. In this case, the yellow band shows the normal variation (mean +/- 1.96 standard deviations). For the monthly chart, the gray line is the actual M/M values and the red line is the 6 month moving average. It’s interesting that when looking at the monthly M/M trend that the vast majority of points are within the normal variation. It’s not until November 2007 that we see a strong downward pressure in the M/M numbers. In the yearly chart, the yellow band is still normal variation as defined above and the gray line is the actual Y/Y values. In this case, the red line represents the 5 year moving average of Y/Y change.

Taking the ‘higher’ view of year-over-year employment numbers, we can see that the decline in Newspaper employment started in 1986 (24 years ago)! That is not to say that the rapid expansion of internet use this past decade hasn’t had an impact on the Newspaper industry, but if the sole basis for health of an industry is the number of employees it has, then the newspaper business has been in trouble for nearly a quarter of a century and there are larger forces at play than citizen journalism and Google et. al. “stealing” newspaper content.

So, what conclusions do the data support? Newspaper sector employment peaked (on an absolute basis) in 1990. On a percentage growth basis, employment peaked in the mid 80s and has been declining ever since. The rate of loss in employment has accelerated in the last 2-3 years. It’s unclear from the data, since it is a single measure, what is driving the changes described. It’s also unclear, despite the link-baiting headline given to the Alley Insider chart whether or not this is the end of the newspaper business. From an employment basis, the sector has contracted to early 1950 levels but we lack the proper context to understand what the implications of that contraction are.

By: Clint | Posted in visualization | Tagged: , , , , | 2 Comments »
U.S. Census Projections 2005-2030

December

29

2009

World population So the other night I was looking for some data to play with and Bob Page recommended taking a look at a U.S. census data to find an amazing unknown correlation. Well, I did pull down some census data from Many Eyes, but alas I haven’t found an amazing correlation. What I ended up with instead, is a relatively simple yet interesting exploration of projected U.S. populations, by age, from 2005 to 2030.

In the visualization below ages are grouped into ‘Age Groups’ (basically minor, adult and middle-aged +) then into more typical age bands (familiar to marketers I’m sure) and down to individual ages. The top ‘row’ of visualization shows the projected population in 2030 compared to the population in 2005, visualized as bullet charts. On these bullet charts, the darker gray background as well as the black reference lines mark the number of 2005 while the bullets show the projected population in 2030. Most folks are probably familiar with the story that America is aging  — that our population is growing in the elder segments. You can see in the bullets for Age Groups that those 55 years and older show the most growth both in absolute terms (from 67 million in 2005 to nearly 111 million in 2030) and on a percentage basis – +65% vs +8% for people ages 18-54 and +16% for those aged 0-17. Looking at the Age Band bullets (horizontal) we can see that elderly growth is most concentrated in the 65-74, 75-84 and 85+ bands. In other words, most of the population growth is in retirement-aged people!

Ok, so now try clicking on the bullet (red bar) for people ages 55-64. Notice that the ‘Ages Detail Trend’ chart filters for that age band. The chart itself shows the percentage growth of a given year compared to the first year reported (e.g. % growth of population in year 2015 compared to 2005). If you are looking at ages 55-64 in the trend, notice the highly stratified trend lines. The younger end of that age band (55-58) shows relatively tame growth over the 25 year period compared to those aged 59-64.

Powered by Tableau

FYI, this interactive visualization was built with Tableau Public, which is currently in private beta, and uses new features of Tableau 5.1 (also in beta) like native support for Bullet Charts – Hooray! Feel free to share the visualization by clicking the ‘Share’ button at the bottom of the vis and grabbing the embed code.

By: Clint | Posted in visualization | Tagged: , , , | 3 Comments »
Creating A Pseudo Reference Line in Tableau

July

7

2009

One of the weaknesses of Tableau is that you can’t add a reference line to a chart that is based on a custom calculation or another metric within the workbook. The calcs that are offered for reference lines are good and applicable in many situations but sometimes, they are just not right. In the short video below, Joe Mako explains how to fake a custom, dynamic reference in Tableau.

Adding a sudo reference line in Tableau from Joe Mako on Vimeo.

NOTE: Joe directed me to his tutorial after I was asking some questions about reference lines on Twitter yesterday, I’m simply passing along his great little tutorial. Please enjoy!

By: Clint | Posted in Tableau, Tableau Tips | Tagged: , , | No Comments »
Policing the Viz Police

March

23

2009

Tableau SoftwareRecently, Tableau published a well-meaning blog post to highlight some of the inherent problems with geography-based visualizations under their tongue-in-cheek “Viz Police” heading. They take issue with a recent visualization published by Media Cloud, a Harvard Law project, showing how various news outlets cover various countries around the globe. The interactive graphic allows you to choose up to three news organizations and then 3 different data sets to compare (top 10 search terms, top 10 term pivot and world map – the visualization Tableau chooses to discuss).

I don’t necessarily disagree with Tableau’s argument, but I think they made several errors in how they chose to communicate it.

Error #1: Bad Blogging Etiquette: The post provides no link to the Media Cloud project and the specific item under discussion. There is no native way for the reader to go back to Media Cloud and investigate the visualization on their own. I wasn’t familiar with Media Cloud so I actually had to Google it to get there.

Error #2: Poor Graphic Use: Tableau chose to use just a thumbnail (the same thumbnail that is provided by Media Cloud) of the infographic. Furthermore, they covered it up with their “Viz Police” badge making it impossible to get a decent view of the graphic. Finally, the way they incorporated their badge into the Media Cloud graphic (it looks like someone screen-shot a layered graphic out of Photoshop or similar) presupposes the bad nature of the graphic. In other words, poor execution of including the badge on the Media Cloud graphic gives the reader the, possibly false, impression that the map is a bad graphic.

Error #3: Lack of a Full-Size Graphic: Tableau did not provide a full-sized version of the Media Cloud graphic. Now, I run hi-res (1928×1208) so I’d argue that the original graphic is too small anyway. But, when you look at the full-sized graphic sans “Viz Police” badge, the errors are not quite as egregious as indicated by Tableau. In the full-sized graphic the Area-bias still exists, but it’s clear that the UK is more saturated than the U.S. on the BBC graphic – when you spend some time looking at it.

Error #4: Misdirection. In attempting to show how Tableau’s solution (circles vs. density) is better, they bring in a completely different data set – “Net Internal Migration by State”. Now Media Cloud does not provide the data behind their graphic so if you wanted to create a comparative graphic in another tool you’d presumably have to jump through some hoops to either get them to provide the data or to estimate the data, but in either case you are not comparing apples and steel ingots ( apples & oranges are, in fact, too similar for the old adage to work) as Tableau is asking you to do in their post.

Error #5: Using area to encode value. In an error similar to the map-density one they are arguing against, Tabeau’s example uses the area of the circles to encode some value – which is not even explained via a legend (the legend only explains the color encoding)! For those that may not know, we humans are not good at estimating area, it’s not what our visual systems are built to do. We tend to over-estimate large areas and underestimate small areas – remember, this is the basic argument against pie charts.

Error #6: Chart Junk. At best, chart junk obfuscates your data making it difficult to understand. At worst, it causes bias or error in judgment of the data. Well, in the Tableau map, the circles have a light-colored border. This border is more evident on some of the plot points than others, creating the misrepresentation that those points are somehow more important. Is the migration to Maine somehow more important than the migration to Minnesota? I don’t really know because they are roughly the same size and color BUT the border on the Maine circle is much clearer than the one on Minnesota – what does that mean?

Error #7: Breaking map conventions. I can’t speak for the rest of the world, but here in the U.S. a circle on a map generally means a population center – a city – and the size of that circle may indicate how big the city is, or all circles are the same size (e.g. no value encoded on the area). So, when I’m looking at a map like this I/we expect the circles to reference a city and these circles do not, they reference a state – Tableau has just broken your mental model of a map! When you use a convention unconventionally and break the standard mental model you typically end up creating cognitive dissonance. I’m not saying that it should never be done, but you have to be very careful. If the confusion throws something into sharp relief that might otherwise be obscured, ok you’ve got a case to do it, but if all it does is create a buzzing between the ears that makes processing the information more difficult you are better off not doing it.

So what? Why am I all in a huff? It’s not because I dislike Tableau – quite the opposite, I am an avid user. I do dislike poorly executed arguments. If the argument is not made cogently, it has holes in it. It looks sloppy and therefore is less effective. Tableau has an excellent point about the pitfalls of area-based information graphics but they’ve shot themselves in the foot with how they argue it and that makes it less likely their readers will understand and trust the argument which well might lead them to not using the learning in their design efforts.

To be honest, I find the whole argument a bit disingenuous. The post argues against a specific type of area encoding – density encoding on geographic areas but Tableau not only allows area encoding on plot types up to and including the ignoble pie chart but their geographic visualizations allow density encoding via the data layers.

BTW, this is a follow-on to the comment I posted on Tableau’s blog. I wasn’t particularly happy with my comment so I rewrote it as a post here rather than editing the comment there.

Source: Media Cloud A Harvard Law | Berkman Center Project

Source: Media Cloud A Harvard Law | Berkman Center Project

[caption id="attachment_453" align="aligncenter" width="200" caption="Source: Tableau Software"]Source: Tableau Software[/caption]
Source: Media Cloud a Harvard Law | Berkman Center project

Source: Media Cloud a Harvard Law | Berkman Center project


Source: Tableau

Source: Tableau

By: Clint | Posted in Tableau | Tagged: , , , | 6 Comments »
Tableau Tips – Building A Calendar-Based View

January

7

2009

It’s a new year and while I’ve effectively been on hiatus from this blog for about 4 months it’s probably time to get started again.

We’ve been using Tableau at work for about three months during which I had something of a rare experience (or at least rare to me). Still being a neophyte with Tableau it was surprising to me that building a simple calendar view of data hadn’t occurred to the Tableau team. Purists will argue (and who’s to disagree) that calendars don’t make the best data visualizations (lack of ability to see trends, low data to ink ration, blah blah blah) but when you have a set of users that prefer a calendar AND everyone is already trained to use/read/interpret a calendar, it’s a compromise I can live with.

Long story short, it was easy enough to build the calendar framework in Tableau but I was still new enough to struggle with getting the data into the calendar in the way that I wanted. So, I reached out to the Sales Engineering team at Tableau (Thanks Ty!) and they he helped me out. The surprising part was the mini-storm of attention that this created on the Tableau team. Apparently this type of visualization hadn’t occurred to them and it created some excitement. What follows is a combination of what I learned on my own and what Ty helped me to create.

For this exercise, we’ll be relying on some of the training data that comes along with Tableau – the ‘superstore data’.

This post is pretty in-depth (and long) but it assumes that you have a working knowledge of Tableau and SQL. If you are new to either or both, some of this may not make sense.

Set 1: Getting Set Up

  1. Open Tableau
  2. Start a new workbook
  3. Connect to “Sample -Superstore Sales” Excel file

New Tableau Workbook - Superstore Sales

 

 

 

 

 

 

 

 

 

 

You may see some different dimensions and measures in here than what you see in your workbook. There are two reasons for this, there are a couple of custom dimensions and measures that we need to create along the way AND you may have a different version of this file than I do.

Step 2: Building the Calendar Framework

  1. The first thing we need is a custom dimension, not because we’re going to use it right away but we will need it later for building in some interactivity and it’s part of the calendar framework. this dimension will be called “YYMM” and is a simple concatenation of the 2-digit year and 2-digit month from the Order Date dimension.Tableau Calculated Field Dialog Box

     

     

     

     

     

    This simple dimension is simple a string and will return values like ’0801′ (e.g. January 2008)

  2. Setting the stage…
    1. Right-mouse click and drag the “Order Date” dimension to the Columns pane. This should prompt you to select the type of aggregation to use – select “WEEKDAY(Order Date)”. This will get you your days of the week running across the top.
    2. Now drag “YYMM” to the Rows pane, followed by “YEAR(Order Date)” (which is the default aggregation for date dimensions), “MONTH(Order Date)” (remember that right-mouse drag?) and “WEEK(Order Date)”

      Right now you have a table that should look something like the following:Tableau Calendar Table

     

     

     

     

     

     

     

  3. Making the table into a calendar
    1. In the “Marks” pane, change the Marks drop down to “Square”
    2. Make a new custom dimension called “Day” with the formula DAY([Order Date]) (Make sure that day is in the Dimensions pane and not the Measures pan after creating it)
    3. Drag “Day” to the “Level of Detail” box in the “Marks” pane

      Now the Tableau stage probably looks something like this:

      tableau calendar 2

       

       

       

       

       

       

       

    4. Let’s put in a quick filter so that we can look at just one month, we’ll remove the filter later. Control drag (CTRL+Drag) “YYMM” from the Columns pane to the “Filters” pane. Put a check mark on “Exclude” and then scroll to the end of the list and take the check mark off of “200812″. Now we’re just looking at December 2008. Still doesn’t look like much of a calendar does it? But it’s in there.
    5. We just have to manually adjust the size of the stage and the cells to get what we are looking for.
      1. Find the up/down handle on the Week of Order Date cell and drag it down so that the row height is about 1.5″
      2. Do the same for the Day of Week Columns. We’re now getting closer, you can probably start to see the calendar format:tableau calendar table 3

         

         

         

         

         

         

         

         

      3. Add borders to enhance the calendar effect. Go to Format –> Borders
        1. Add a black, narrow border to Cell under the Default section of the Sheet–>Borders pane
        2. Ass an added bonus you can now fine tune your column width and row height to a square

        Your Tableau stage should now look an awful lot like a calendar, albeit an ugly blank one:tableau calendar table 4

         

         

         

    6.  

       

 

 

 

 

 

Step 3: Adding Data and Visual Analysis Cues

So now we have our basic calendar structure but there is no data in it and let’s face it, it’s U-G-L-Y. Let’s start by adding data.

  1. For the purpose of this exercise, we’re going to assume that we want to keep track of total sales (Gross Revenue) and Profit. We also need to add the day of the month to each cell. Initially, I had gone down the road of using Tableau annotations to create this but it’s cumbersome and breaks whenever the dates change. So the biggest thing that Ty in Sales Engineering helped me with was a custom measure that contained the Day of the Month, the total Sales and the Profit.
    1. So, we’re going to create a custom measure called “Day Calc” that concatenates the day of the month, the Sales total and the Profit total. The formula looks like this:

      str(MIN(DAY([Order Date])))+”

      s: “+str(round(SUM([Sales]),2))+”
      p: “+str(round(SUM([Profit]),2))

    2. Now, drag “Day Calc” from the Measures pane to the “Text” field in the Marks pane
    3. Right-mouse click a date cell and go to Format. Set the vertical alignment to “Top”. Horizontal alignment is your choice, as long as it’s left or right ;) tableau calendar table 5

       

       

       

       

       

       

       

       

       

       

       

      Ok, now we have actual data in the calendar, but what’s up with the color? The blue ain’t helping – maybe we should do something about that?

  2. Use color as an indicator of health
    1. Drag “Sales” from the Measures pane to the “Color” field on the Marks pane
    2. In the new “Colors” pane that appears below Measures, click the down arrow and select “Edit Colors”
    3. In the new dialog box, change the palette from “Automatic” to “Red-Green Diverging
    4. Put a check on “Stepped Color” and change the number of steps to two (2)tableau calendar table 6

       

       

       

       

       

       

       

       

       

       

       

      The colors, which are splitting on the average revenue in this scenario, are still quite a bit saturated though.

    5. Change the opacity of the cell colors
      1. Back in the Marks pane, drag the opacity slider under Color to the left to increase the transparency of the color in order to desaturate the displaytableau calendar table 7. Find a level you’re happy with it and go!

 

 

 

 

 

 

 

 

 

 

Step 4: Cleaning It Up

Now we have a calendar that gives us useful information and visual cues to the health of a particular day – hooray! But there is still a lot of stuff chart junk showing that we don’t necessarily need to show.

Things that can probably be hidden (not removed because they are either needed to maintain the calendar framework or we will need them later to create interactivity).

  1. YYMM can be hidden
  2. Week of Order Date can be hidden

    These two items can be hidden by clicking on the dropdown arrow in their respective lozenges (yes their called lozenges) in the Row shelf and taking the check mark off of “Show Header”

  3. Additionally, we can remove the row headers. Right-mouse click on “Order Date” in the stage and select “Hide Field Labels for Columns” – Order Date should disappear. Right-mouse click on either “Year of Order Date” or “Month of Order Date” and select “Hide Field Labels for Rows” and both of those should now be hidden.
  4. You may want to resize the Year and Month columns at this point but keep them visible because in the next post we’ll be making this calendar interactive so you want to make sure and indicate which month and year is being viewed
  5. VIOLA! A calendar based view in Tableau:Completed Tableau Calendar View

 

 

 

 

 

 

 

 

 

 

Stay tuned for the next post in this series where we’ll discuss taking this view and making it interactive!

 

 

Here’s an updated version of the calendar that I did when speaking to the Atlanta Tableau User Group:

Powered by Tableau
By: Clint | Posted in Tableau | Tagged: , , | 12 Comments »