On the Importance of Double-Checking Graphs

Published by Waldo Jaquith on February 23, 2009 in Meta News. 33 Comments

The Daily Progress has twice used extremely misleading charts in the past few weeks, and this seems like a good opportunity to highlight the importance of being a critical reader of charts and graphs. In both instance they employed bubble charts, a type of chart that is often avoided because people have an awfully hard time understanding them. At right is the chart that the paper employed on Sunday in Ranchana Dixit and Brandon Shulleeta’s “Can we afford our future?,” which tracked expensive capital improvement projects being planned in Charlottesville and Albemarle. As you can see, a minuscule amount is being spent on water and sewer improvements in comparison to Places29—in fact, more is being spent on Places29 than the other three combined. But the dollar values don’t add up. What’s going on?

The problem here is that people are really bad at comparing area. We do well with comparing colors, lengths, shapes, etc., but our brains are not well-equipped to figure out bubble charts. Doubly problematic is that this math behind the visuals in this chart is flat-out wrong. The area represented by the bottom bubble is $45.5M so, proportionally, the top bubble should represent $1,300M, not $312M. It’s 320% too big. Likewise, the middle two bubbles are significantly too large. The effect is to significantly exaggerate the disparity of the area’s spending priorities. Though having that big “$312 million” graphic above the fold on the front page of the paper is eye-catching, it’s misleading.

Here’s a side-by-side of the chart as it was presented in the Progress and how it should have looked:

As you can see, the effect isn’t nearly as striking. But it does have the benefit of being correct.

Stats-geek blog Junk Charts has a whole category to keep track of misleading bubble charts, because they’re so commonly misused. Understand, though, they’re not being misused maliciously by newspapers; they’re just easy to get wrong. In this case, the paper used an area-based chart with math that applies only to diameter leading to a geometric exaggeration of the data. The solution for the Progress is to use a simpler chart. (I’ve mocked it up as a bar chart, which is a much better format for this data.) This is certainly a minor sin, but it’s the sort of thing that (quite wrongly) leads readers to cry “bias!” when “mistake!” is a more appropriate response. And if you read this story in the paper yesterday, and didn’t think something was funny about that graph, the solution for you is to be a critical reader of charts and graphs. Look at the numbers, do a quick comparison and see if you’ve been given a false impression by the visuals.

33 Responses to “On the Importance of Double-Checking Graphs”

Feed for this Entry

Tim McCormack says:

February 23, 2009 at 10:59 pm

This is why I take the square root of the popularity of a tag when displaying a tag cloud.
Elux Troxl says:

February 24, 2009 at 8:15 am

If you compare the diameters and not the area of the original graphs, they are correct.
Harry Landers says:

February 24, 2009 at 8:44 am

Although what Elux Troxl says is true, it’s not particularly helpful if the goal of the graph is to help the reader to make an easy visual comparison among several bits of information. Most of us are in no position to quickly evaluate comparative diameters. A diameter is just a line, so if they wanted to use that kind of metric, they would have better served their readers by displaying a bar graph (i.e. lines), as Waldo suggests. Instead, they seem to have camouflaged the line inside of circles.
Waldo Jaquith says:

February 24, 2009 at 8:59 am

If you compare the diameters and not the area of the original graphs, they are correct.

That’s right, as I explained:

In this case, the paper used an area-based chart with math that applies only to diameter leading to a geometric exaggeration of the data.
Keith Davis says:

February 24, 2009 at 10:12 am

I also find odd about this chart that the water supply has a different level of precision. Any idea why DP thought they should throw the extra $800,000 on that one, but not provide equivalent levels of precision on the other three? (For those who think I missed the fourth one, it has the same number of significant figures, so I am giving DP the benefit of the doubt.)
ead says:

February 24, 2009 at 10:42 am

Kind of reminds me of the accounting practices leading to these numbers.
Cville Eye says:

February 24, 2009 at 6:56 pm

“Bubble graphs are like scatter graphs, with a third variable size. They let you chart three variables in two dimensions.” (http://74.125.47.132/search?q=cache:XUwJg6lo_hwJ:www.graphicsserver.com/com_products/GraphChoiceBubble.aspx when to use bubble charts&hl=en&ct=clnk&cd=10&gl=us) Why was this type of chart used to display this linear data, I have no idea. Maybe the chart maker would like to weigh-in.
oldvarick says:

February 24, 2009 at 9:27 pm

Waldo, thanks for the enlightenment. Never was a math major, so I take a lot of this sort of thing at face value. There are probably a lot of other folks like me that think that bubbles can’t lie (or be mistakenly misproportioned….)
Michael Strickland says:

February 25, 2009 at 4:33 am

It’s funny you should run this today – production got dragged out extremely late at NYU’s paper last night, and we ended up needing to make a graph for the front page at around 4:30 a.m.

We flipped the numbers in the pie chart, putting 35.4% in the big chunk and 64.6% in the little chunk.

10,000 copies and a 7 a.m. email from the university spokesman later, and we realized. Of course, there’s a difference between poor design and sleep deprivation.
Betty Mooney says:

February 25, 2009 at 6:40 am

Besides the fact that the $142 million for the water supply is a complete guess not based on any accurate data and could climb to over $200 million or even higher if the current proposal to build the new dam and new up-hill pipeline isn’t modified.
Majung says:

February 26, 2009 at 12:26 pm

I have no idea what your fussing about here. Not only do the bubbles look more or less proportional to my eyes, but your “fixed” bubbles are flat out wrong. To wit: 312 / 45 = approx. 7. That’s SEVEN times a larger amount of money. Do your bubbles even remotely look proportional? Ah, nope.

This does not mean I am providing any support for the positions of the Daily Regress article, but I fail to see what you’re trying to say.
Waldo Jaquith says:

February 26, 2009 at 12:58 pm

Not only do the bubbles look more or less proportional to my eyes, but your “fixed” bubbles are flat out wrong. To wit: 312 / 45 = approx. 7. That’s SEVEN times a larger amount of money. Do your bubbles even remotely look proportional? Ah, nope.

You’ve proved my point about the difficulty of estimating area proportionally. :)

The ratio between the largest and small numbers is actually 6.86:1, pretty close to the 7:1 you estimated. My largest bubble is 5,539 pixels, and my smallest is 803 pixels, a ratio of 6.89:1, or about 1% from perfect. The DP’s biggest graph is 31,400 pixels, and their smallest is also 803 pixels, a ratio of 39.1:1, which exaggerates the difference significantly, to put it kindly.

But I’ve got to give you credit: you doubted my charts. So I see the lesson of this blog entry has been learned. :)
Majunga says:

February 27, 2009 at 12:41 am

Waldo, I’m sure your ‘surface areas’ are PhotoShop perfect! But then, maybe you think Monet sucks and Renoir should have stayed in the Limoges Porcelain factory?

Look, there’s so much deception in the media today, this is just a distraction. How’bout you do a story on the ethical sellout of NPR?
Cville Eye says:

February 27, 2009 at 3:08 am

The purpose of graphing is show visual relationships between similar classes of information. The DP’s graph does not. As Waldo said a simple bar graph (or stacked) would have been sufficient to convey the information in the article at a glance. If people have to start measuring the graph entities themselves, then why have a graph?
Majung says:

February 27, 2009 at 9:18 am

“The purpose of graphing is show visual relationships between similar classes of information. The DP’s graph does not.”

Yes it does.
Waldo Jaquith says:

February 27, 2009 at 11:05 am

I’m afraid that I don’t know how to make it any clearer. Your complaint was that such a graph needs to be proportional. I demonstrated to you that it is not. Now you say that it doesn’t matter. If you don’t think that accuracy is important in reporting, then I can’t see that anything further can come out of this exchange.

But I would like to provide you with this graph indicating how seriously that Barrack Obama defeated John McCain in the general election:

Obama: ================================= (52.9%) McCain: = (45.7%)
Majung says:

February 27, 2009 at 12:37 pm

Waldo: you’re the one that is complaining. Remember?

But then you show a “graph” that is incredibly skewed, since there’s absolutely no sense of proportion!!! This is what you should have shown:

Obama (52.9%): =====================================================
McCain (47.7%): ===============================================
Waldo Jaquith says:

February 27, 2009 at 12:45 pm

Now you’re getting it. :)
Majunga says:

February 27, 2009 at 12:45 pm

And that type of graph is fine for values that are in the same ballpark, like your Obama/McCain demo. But balloons are great to impart relative value size when some are many factors of another.

PS Your blog manager doesn’t support Chrome when it’s doing its ‘spam prevention’ routine…
Majunga says:

February 27, 2009 at 12:49 pm

>>> Waldo Jaquith
Feb 27th, 2009 at 12:45 pm
Now you’re getting it. :)

Eh? Double-check your own stat graphs before you post them, right Waldo? Did you learn the lesson?
Waldo Jaquith says:

February 27, 2009 at 1:02 pm

I can see that you’re not appreciating the irony of this joke, Majunga.
Majung says:

February 27, 2009 at 1:19 pm

*****
*******************************

The following is your type of graph for 50 M$ vs. 310 M$. The DP balloon graph imparts this proportionality quite a bit better, whereas your balloons don’t. Ironic, no? ;)
Majung says:

February 27, 2009 at 1:32 pm

* * * * *
* *
* *
* *
* *
* *
* *
* *
* * *
* * * *
* * * * * * * * * * * *

Here’s another one! At 46K$ household income per year, you live in the dingy little thing. At $312K$ per year, it’s a vinyl castle!
Majung says:

February 27, 2009 at 1:33 pm

Yeah, your blog screwed up my typewriter graphs!
Kris says:

February 27, 2009 at 2:23 pm

But what about the inverse — a graph comparing US earthquakes in the past month, for example? Going by what you say, Waldo, such a graph would have to show the actual power of the earthquake, rather than a logarithmic simplification. So if you’re looking at three earthquakes – 3.0, 4.0 and 5.0 Richter – you couldn’t have
—
—-
—–

but rather
–
———-
—————————————————————————————————-
Waldo Jaquith says:

February 27, 2009 at 3:10 pm

But what about the inverse — a graph comparing US earthquakes in the past month, for example? Going by what you say, Waldo, such a graph would have to show the actual power of the earthquake, rather than a logarithmic simplification.

Earthquakes are measured on logarithmic scales, Kris—you’d want to graph those on an appropriate scale. There is no argument to be made that non-logarithmic differences should be graphed on a logarithmic scale. But that wasn’t done here, anyhow—this is a case of graphing one-dimensional data (size of spending) within two dimensions (the size of a circle), thus geometrically exaggerating the data. This wasn’t done to editorialize or make a point—it was a simple error on the part of the DP in using an inappropriate type of chart.
Will M. says:

February 27, 2009 at 3:44 pm

It is mind-boggling to me that anyone is arguing that this graph is not misleading. The areas of the circles do not reflect the data, plain and simple. If you don’t want people to look at the area, don’t use a circle.
Majung says:

February 27, 2009 at 5:06 pm

It is mind-boggling to me that anyone is arguing that this graph is misleading. The circles reflect the data, plain and simple. If you don’t want people to understand the data, complain about it.
Will M. says:

February 28, 2009 at 3:43 am

Majung,

Let us pretend that if you spilled one cup of milk on the kitchen floor, it makes a circular puddle one foot in diameter. Now, if you were to spill two cups instead of one, would the puddle be two feet in diameter? No. The area will have doubled, but the diameter will have grown by a much smaller amount. (In our example it would be ~1.4 feet wide.) Does this make sense?
Majunga says:

February 28, 2009 at 10:34 am

Will – I think we all grasp the concept of area calculations, “dude”. The issue at hand takes it a step further: does it LOOK like the milk puddle is 2 times bigger? In many cases, especially when you might spill 7 cups of milk, the human eye will not see that because it has been trained to privilege width over area. Also, the human eye sees things in CONTEXT, as opposed to math / computers which require tons of smart programming to be able to even make the simplest contexts coherent.

Therefore, when using bubble graphs of stats that are so largely different, you cannot be just mathematically correct, you need to convey proportionality instantly to humans.

The question becomes then: when to use bubble graphs and how. This question can become quite involved and will call upon many disciplines, not just trigonometry. Personally, I find the DP graph to be properly used, especially since they included the actual numbers in the balloons, imparting mathematically correctness to it. Would I have made their bubbles that divergent? No, I think a .6 or .7 coefficient would have been more appropriate to my eyes…
Cville Eye says:

February 28, 2009 at 12:32 pm

If the purpose of the graph is to convey the information intended by the grapher to most viewers, this graph has failed, as evidenced by this disagreement between Majunga and Waldo. As stated before, bubble graphs are to be used to display three connected quantities in two dimension. Here we have one quantity, so some other type of graph should be used to convey the idea.
Majunga says:

February 28, 2009 at 1:13 pm

Cville Eye – If the purpose of your post was to convey a proper evaluation of the situation, you have failed, as evidenced by the disagreement between you and me. As stated before, bubble graphs can be used with multiple sets of data, so this type of graph can be used to convey the idea.
Cville Eye says:

February 28, 2009 at 3:32 pm

Then we agree, Graphing, as well as Beauty, is in the eye of the beholder?

Comments are currently closed.

On the Importance of Double-Checking Graphs

33 Responses to “On the Importance of Double-Checking Graphs”

Sideblog

Recent Comments

Local News Outlets

Local Websites