He completed the 100m Sprint in 9.69 seconds. Running some simple arithmetic we know that his average speed over the course of the race is 10.32 meters per second. We can convert this metric speed to a more familiar (to Americans) 23.09 MPH. Would it surprise you that in Bolt’s winning 200m race that he was slightly faster at 23.18 MPH? According to Sports Endurance, Bolt’s fastest 10 meter split in the 100 was 0.82 seconds which works out to a peak speed of 27.27 MPH. Here’s a list of animals that are faster than Bolt.
Cheetah (70 MPH)
Pronghorn Antelope (61 MPH)
Wildebeast (50 MPH)
Lion (50 MPH)
Thomson’s Gazelle (50 MPH)
Quarter Horse (47.5 MPH)
Elk (45 MPH)
Cape Hunting Dog (45 MPH)
Coyote (43 MPH)
Gray Fox (42 MPH)
Hyena (40 MPH)
Zebra (40 MPH)
Mongolian Wild Ass (40 MPH)
Greyhound (39.35 MPH)
Whippet (35.5 MPH)
Domestic Rabbit (35)
Mule Deer (35)
Jackal (35)
Reindeer (32)
Giraffe (32)
White-Tailed Deer (30)
Warthog (30 MPH)
Grizzly Bear (30 MPH)
Domestic Cat (30 MPH)
Usain Bolt (27.27 MPH)
Elephant (25 MPH)
Let’s review, here’s a sampling of things from above that are faster than Usain Bolt — Wile E Coyote, Thumper, Bambi and Whiskers the cat. Not to mention Bears, Lions and Rudolph the Red-Nosed Reindeer.
Update: Smee correctly points out that while a Histogram and a Bar Chart are superficially similar, they are not really similar at all. I am properly chastised and have updated the post appropriately.
Look! An excuse to embed a Hulu video on the blog – w00t! (If you go full screen on the video, you can see the element under discussion.)
So, Hulu (a favorite of mine) rolled out a new version of their online video player. Kudos to the team! Generally speaking I like the new player – it’s cleaner than previous (not sure how they made it more sparing than it was before) and the dynamic bit rate is pretty damn awesome but there are a few things that I don’t like (the new loading spinner, the inauthentic-feeling "is this ad relevant to you?" and this new fangled thing they call a "Heat Map".
Here’s what wikipedia has to say about heat maps:
A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colors. A very similar presentation form is a tree map. The term is also used to mean its thematic application as a choropleth map.
Not a bad definition, especially for my purposes here. Two aspects of the definition are key here:
“…values taken by a variable in a two-dimensional map…”
“…are represented as colors.”
So the basic structure of the map is two dimensional – perhaps a Cartesian coordinate system and the values of the variable are represented as a color. What does something like that look like? Here is an example:
We have a basic x (horizontal) – y (vertical) coordinate system and then blobs of color represent where users spend the most attention (as measured by eye-tracking). The deep, red colors indicate high-attention areas while the cooler, more shallow colors (green) represent less attention.
Here’s what Hulu is calling a heat map:
Wait … what? Hulu says this is a heat map? Unfortunately, it only has one dimension (time) for it’s variable (Popularity) and it is monochromatic – a stepped gray scale. It has failed all the tests contained in the heat map definition above. Oh, and its a Bar Chart.
Going back to our favorite source of definitions, Wikipedia has this to say about Histograms
A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent. The bars can also be plotted horizontally.
Bar charts are used for plotting discrete (or ‘discontinuous’) data i.e. data which has discrete values and is not continuous. Some examples of discontinuous data include ‘shoe size’ or ‘eye colour’, for which you would use a bar chart. In contrast, some examples of continuous data would be ‘height’ or ‘weight’. A bar chart is very useful if you are trying to record certain information whether it is continuous or not continuous data.
In this case, we are probably looking at a simplified Histogram — a bar chart — where the bar height equals the frequency of the interval as opposed to the area as described above.
Here’s an example of a bar chart:
This looks a lot more like what Hulu is doing in their so-called heat map.
So why call it a heat map when its clearly a bar chart?
If its a naming thing, why not call it the Popularity Curve? I get that bar chart is business-boring. But, “heat map” is an established visualization concept with concrete rules and its nothing like a bar chart.
Now, I’m not saying that the underlying concept here is bad – I think it may be quite interesting (more on that later).
All I’m saying is why call something that it isn’t when on the one hand that misnomer already exists as its own unique concept and on the other hand the original thing is probably the best solution anyway? This isn’t a Shakespearean issue (calling a rose by any other name … I’m paraphrasing here). A heat map is not a bar chart and no amount of wishing will make it so. They are designed to solve different problems.
So, Hulu, let’s call things what they are shall we? What you’ve got there is a bar chart and no amount of wishing will make it a heat map. The bar chart is a good solution – go with it and spend a little more time coming up with a better name for the product than a misapplied name for a completely different visualization technique.
So what’s the purpose of this thing anyway? It’s seemingly obvious, show users where the most attention is paid in a particular show. Straightforward.
But, why?
Content discovery.
If you’ve never seen Glee before, this tool will (hopefully) show you where all the good bits are – according to all the other viewers. Or, if you missed Saturday Night Live with Betty White over the weekend and need to be prepared to talk about it at the water cooler on Monday then the popularity curve will show you all the skits that folks found funny and maybe you can save some time by only watching those sketches instead of the whole episode. This encourages both new and regular viewers to watch more content and to find new content that they enjoy. Pretty cool right?
So what should Hulu call this utility (assuming that you agree that “heat map” is the wrong name for it)?
Yesterday, the Alley Insider published the chart at left as their “Chart of the Day”. The chart shows trends in the number of people employed by the Newspaper industry from 1947 – September 2009 as estimated by the Bureau of Labor Statistics. At first blush it seems ok, but the title (The End of Newspapers) begs for closer inspection.
The issue here, if you haven’t already spotted it, is that the y-axis starts at a non-zero value (200K). While it is acceptable, under certain conditions, to use a non-zero y-axis this isn’t one of them.
Using a non-zero y-axis on a trendline is typically used to expose patterns in the trend that otherwise would not be visible and the absolute value of any given point on the line is not that important. This trendline has a pronounced pattern so a truncated axis does not expose any new information. If you don’t intentionally look at the values on the y-axis then you might assume that where the trendline starts on the left is zero so the employment numbers for newspaper publishers are quickly approaching bottom. That is simply not true, so the chart is being used to lie to you.
Now, I’m not actually arguing that the newspaper business isn’t in trouble. Circulation is down, ad revenue is down, yada yada yada. Looking at the graph, we can see that the most recent months have the same employment levels as those of the early 1950s. But again, without some context, the number of people employed in the industry probably isn’t the best indicator of sector health.
So here’s how I’d look at this single measure of the Newspaper industry:
The two sets of graphs show monthly and annual employment estimates and for the Newspaper industry from 1947 to 2009. The first chart in the monthly or yearly block simply trends the number of employees from period to period across the entire set. The variable width of the line is also the number of employees and is used to re-enforce the relative weight of one period compared to another. The yellow band on the employment trend charts covers the minimum number of employees to the maximum – basically the vertical height covered by the Alley Insider chart. You can see that there is quite a bit of white space underneath the yellow band and thus the chart lie. Changing the start value of the y-axis doesn’t expose any new insight about the trend, exaggerates the line slope and makes it seem as though the employment numbers are bottoming out – which they are not.
The second chart in each block shows the percent (%) change period-over-period (M/M or Y/Y) in number of employees. In this case, the yellow band shows the normal variation (mean +/- 1.96 standard deviations). For the monthly chart, the gray line is the actual M/M values and the red line is the 6 month moving average. It’s interesting that when looking at the monthly M/M trend that the vast majority of points are within the normal variation. It’s not until November 2007 that we see a strong downward pressure in the M/M numbers. In the yearly chart, the yellow band is still normal variation as defined above and the gray line is the actual Y/Y values. In this case, the red line represents the 5 year moving average of Y/Y change.
Taking the ‘higher’ view of year-over-year employment numbers, we can see that the decline in Newspaper employment started in 1986 (24 years ago)! That is not to say that the rapid expansion of internet use this past decade hasn’t had an impact on the Newspaper industry, but if the sole basis for health of an industry is the number of employees it has, then the newspaper business has been in trouble for nearly a quarter of a century and there are larger forces at play than citizen journalism and Google et. al. “stealing” newspaper content.
So, what conclusions do the data support? Newspaper sector employment peaked (on an absolute basis) in 1990. On a percentage growth basis, employment peaked in the mid 80s and has been declining ever since. The rate of loss in employment has accelerated in the last 2-3 years. It’s unclear from the data, since it is a single measure, what is driving the changes described. It’s also unclear, despite the link-baiting headline given to the Alley Insider chart whether or not this is the end of the newspaper business. From an employment basis, the sector has contracted to early 1950 levels but we lack the proper context to understand what the implications of that contraction are.
So the other night I was looking for some data to play with and Bob Pagerecommended taking a look at a U.S. census data to find an amazing unknown correlation. Well, I did pull down some census data from Many Eyes, but alas I haven’t found an amazing correlation. What I ended up with instead, is a relatively simple yet interesting exploration of projected U.S. populations, by age, from 2005 to 2030.
In the visualization below ages are grouped into ‘Age Groups’ (basically minor, adult and middle-aged +) then into more typical age bands (familiar to marketers I’m sure) and down to individual ages. The top ‘row’ of visualization shows the projected population in 2030 compared to the population in 2005, visualized as bullet charts. On these bullet charts, the darker gray background as well as the black reference lines mark the number of 2005 while the bullets show the projected population in 2030. Most folks are probably familiar with the story that America is aging — that our population is growing in the elder segments. You can see in the bullets for Age Groups that those 55 years and older show the most growth both in absolute terms (from 67 million in 2005 to nearly 111 million in 2030) and on a percentage basis – +65% vs +8% for people ages 18-54 and +16% for those aged 0-17. Looking at the Age Band bullets (horizontal) we can see that elderly growth is most concentrated in the 65-74, 75-84 and 85+ bands. In other words, most of the population growth is in retirement-aged people!
Ok, so now try clicking on the bullet (red bar) for people ages 55-64. Notice that the ‘Ages Detail Trend’ chart filters for that age band. The chart itself shows the percentage growth of a given year compared to the first year reported (e.g. % growth of population in year 2015 compared to 2005). If you are looking at ages 55-64 in the trend, notice the highly stratified trend lines. The younger end of that age band (55-58) shows relatively tame growth over the 25 year period compared to those aged 59-64.
FYI, this interactive visualization was built with Tableau Public, which is currently in private beta, and uses new features of Tableau 5.1 (also in beta) like native support for Bullet Charts – Hooray! Feel free to share the visualization by clicking the ‘Share’ button at the bottom of the vis and grabbing the embed code.