*The Daily Progress* has twice used extremely misleading charts in the past few weeks, and this seems like a good opportunity to highlight the importance of being a critical reader of charts and graphs. In both instance they employed bubble charts, a type of chart that is often avoided because people have an awfully hard time understanding them. At right is the chart that the paper employed on Sunday in Ranchana Dixit and Brandon Shulleeta’s “Can we afford our future?,” which tracked expensive capital improvement projects being planned in Charlottesville and Albemarle. As you can see, a minuscule amount is being spent on water and sewer improvements in comparison to Places29—in fact, more is being spent on Places29 than the other three combined. But the dollar values don’t add up. What’s going on?

The problem here is that people are really bad at comparing area. We do well with comparing colors, lengths, shapes, etc., but our brains are not well-equipped to figure out bubble charts. Doubly problematic is that this math behind the visuals in this chart is flat-out wrong. The area represented by the bottom bubble is $45.5M so, proportionally, the top bubble should represent *$1,300M,* not $312M. It’s 320% too big. Likewise, the middle two bubbles are significantly too large. The effect is to significantly exaggerate the disparity of the area’s spending priorities. Though having that big “$312 million” graphic above the fold on the front page of the paper is eye-catching, it’s misleading.

Here’s a side-by-side of the chart as it was presented in the *Progress* and how it *should* have looked:

As you can see, the effect isn’t nearly as striking. But it does have the benefit of being correct.

Stats-geek blog Junk Charts has a whole category to keep track of misleading bubble charts, because they’re so commonly misused. Understand, though, they’re not being misused maliciously by newspapers; they’re just easy to get wrong. In this case, the paper used an area-based chart with math that applies only to *diameter* leading to a geometric exaggeration of the data. The solution for the *Progress* is to use a simpler chart. (I’ve mocked it up as a bar chart, which is a much better format for this data.) This is certainly a minor sin, but it’s the sort of thing that (quite wrongly) leads readers to cry “bias!” when “mistake!” is a more appropriate response. And if you read this story in the paper yesterday, and didn’t think something was funny about that graph, the solution for *you* is to be a critical reader of charts and graphs. Look at the numbers, do a quick comparison and see if you’ve been given a false impression by the visuals.

This is why I take the

square rootof the popularity of a tag when displaying a tag cloud.If you compare the diameters and not the area of the original graphs, they are correct.

Although what Elux Troxl says is true, it’s not particularly helpful if the goal of the graph is to help the reader to make an easy visual comparison among several bits of information. Most of us are in no position to quickly evaluate comparative diameters. A diameter is just a line, so if they wanted to use that kind of metric, they would have better served their readers by displaying a bar graph (i.e. lines), as Waldo suggests. Instead, they seem to have camouflaged the line inside of circles.

That’s right, as I explained:

I also find odd about this chart that the water supply has a different level of precision. Any idea why DP thought they should throw the extra $800,000 on that one, but not provide equivalent levels of precision on the other three? (For those who think I missed the fourth one, it has the same number of significant figures, so I am giving DP the benefit of the doubt.)

Kind of reminds me of the accounting practices leading to these numbers.

“Bubble graphs are like scatter graphs, with a third variable size. They let you chart three variables in two dimensions.” (http://74.125.47.132/search?q=cache:XUwJg6lo_hwJ:www.graphicsserver.com/com_products/GraphChoiceBubble.aspx when to use bubble charts&hl=en&ct=clnk&cd=10&gl=us) Why was this type of chart used to display this linear data, I have no idea. Maybe the chart maker would like to weigh-in.

Waldo, thanks for the enlightenment. Never was a math major, so I take a lot of this sort of thing at face value. There are probably a lot of other folks like me that think that bubbles can’t lie (or be mistakenly misproportioned….)

It’s funny you should run this today – production got dragged out extremely late at NYU’s paper last night, and we ended up needing to make a graph for the front page at around 4:30 a.m.

We flipped the numbers in the pie chart, putting 35.4% in the big chunk and 64.6% in the little chunk.

10,000 copies and a 7 a.m. email from the university spokesman later, and we realized. Of course, there’s a difference between poor design and sleep deprivation.

Besides the fact that the $142 million for the water supply is a complete guess not based on any accurate data and could climb to over $200 million or even higher if the current proposal to build the new dam and new up-hill pipeline isn’t modified.

I have no idea what your fussing about here. Not only do the bubbles look more or less proportional to my eyes, but your “fixed” bubbles are flat out wrong. To wit: 312 / 45 = approx. 7. That’s SEVEN times a larger amount of money. Do your bubbles even remotely look proportional? Ah, nope.

This does not mean I am providing any support for the positions of the Daily Regress article, but I fail to see what you’re trying to say.

You’ve proved my point about the difficulty of estimating area proportionally. :)

The ratio between the largest and small numbers is actually 6.86:1, pretty close to the 7:1 you estimated. My largest bubble is 5,539 pixels, and my smallest is 803 pixels, a ratio of 6.89:1, or about 1% from perfect. The

DP’sbiggest graph is 31,400 pixels, and their smallest is also 803 pixels, a ratio of39.1:1,which exaggerates the differencesignificantly,to put it kindly.But I’ve got to give you credit: you doubted my charts. So I see the lesson of this blog entry has been learned. :)

Waldo, I’m sure your ‘surface areas’ are PhotoShop perfect! But then, maybe you think Monet sucks and Renoir should have stayed in the Limoges Porcelain factory?

Look, there’s so much deception in the media today, this is just a distraction. How’bout you do a story on the ethical sellout of NPR?

The purpose of graphing is show visual relationships between similar classes of information. The DP’s graph does not. As Waldo said a simple bar graph (or stacked) would have been sufficient to convey the information in the article at a glance. If people have to start measuring the graph entities themselves, then why have a graph?

“The purpose of graphing is show visual relationships between similar classes of information. The DP’s graph does not.”

Yes it does.

I’m afraid that I don’t know how to make it any clearer. Your complaint was that such a graph needs to be proportional. I demonstrated to you that it is not. Now you say that it doesn’t matter. If you don’t think that accuracy is important in reporting, then I can’t see that anything further can come out of this exchange.

But I would like to provide you with this graph indicating how seriously that Barrack Obama defeated John McCain in the general election:

`Obama: ================================= (52.9%)`

McCain: = (45.7%)

Waldo: you’re the one that is complaining. Remember?

But then you show a “graph” that is incredibly skewed, since there’s absolutely no sense of proportion!!! This is what you should have shown:

Obama (52.9%): =====================================================

McCain (47.7%): ===============================================

Now you’re getting it. :)

And that type of graph is fine for values that are in the same ballpark, like your Obama/McCain demo. But balloons are great to impart relative value size when some are many factors of another.

PS Your blog manager doesn’t support Chrome when it’s doing its ‘spam prevention’ routine…

>>> Waldo Jaquith

Feb 27th, 2009 at 12:45 pm

Now you’re getting it. :)

Eh? Double-check your own stat graphs before you post them, right Waldo? Did you learn the lesson?

I can see that you’re not appreciating the irony of this joke, Majunga.

*****

*******************************

The following is your type of graph for 50 M$ vs. 310 M$. The DP balloon graph imparts this proportionality quite a bit better, whereas your balloons don’t. Ironic, no? ;)

* * * * *

* *

* *

* *

* *

* *

* *

* *

* * *

* * * *

* * * * * * * * * * * *

Here’s another one! At 46K$ household income per year, you live in the dingy little thing. At $312K$ per year, it’s a vinyl castle!

Yeah, your blog screwed up my typewriter graphs!

But what about the inverse — a graph comparing US earthquakes in the past month, for example? Going by what you say, Waldo, such a graph would have to show the actual power of the earthquake, rather than a logarithmic simplification. So if you’re looking at three earthquakes – 3.0, 4.0 and 5.0 Richter – you couldn’t have

—

—-

—–

but rather

–

———-

—————————————————————————————————-

Earthquakes are measured on logarithmic scales, Kris—you’d want to graph those on an appropriate scale. There is no argument to be made that non-logarithmic differences should be graphed on a logarithmic scale. But that wasn’t done here, anyhow—this is a case of graphing one-dimensional data (size of spending) within two dimensions (the size of a circle), thus geometrically exaggerating the data. This wasn’t done to editorialize or make a point—it was a simple error on the part of the

DPin using an inappropriate type of chart.It is mind-boggling to me that anyone is arguing that this graph is not misleading. The areas of the circles do not reflect the data, plain and simple. If you don’t want people to look at the area, don’t use a circle.

It is mind-boggling to me that anyone is arguing that this graph is misleading. The circles reflect the data, plain and simple. If you don’t want people to understand the data, complain about it.

Majung,

Let us pretend that if you spilled one cup of milk on the kitchen floor, it makes a circular puddle one foot in diameter. Now, if you were to spill two cups instead of one, would the puddle be two feet in diameter? No. The

areawill have doubled, but the diameter will have grown by a much smaller amount. (In our example it would be ~1.4 feet wide.) Does this make sense?Will – I think we all grasp the concept of area calculations, “dude”. The issue at hand takes it a step further: does it LOOK like the milk puddle is 2 times bigger? In many cases, especially when you might spill 7 cups of milk, the human eye will not see that because it has been trained to privilege width over area. Also, the human eye sees things in CONTEXT, as opposed to math / computers which require tons of smart programming to be able to even make the simplest contexts coherent.

Therefore, when using bubble graphs of stats that are so largely different, you cannot be just mathematically correct, you need to convey proportionality instantly to humans.

The question becomes then: when to use bubble graphs and how. This question can become quite involved and will call upon many disciplines, not just trigonometry. Personally, I find the DP graph to be properly used, especially since they included the actual numbers in the balloons, imparting mathematically correctness to it. Would I have made their bubbles that divergent? No, I think a .6 or .7 coefficient would have been more appropriate to my eyes…

If the purpose of the graph is to convey the information intended by the grapher to most viewers, this graph has failed, as evidenced by this disagreement between Majunga and Waldo. As stated before, bubble graphs are to be used to display three connected quantities in two dimension. Here we have one quantity, so some other type of graph should be used to convey the idea.

Cville Eye – If the purpose of your post was to convey a proper evaluation of the situation, you have failed, as evidenced by the disagreement between you and me. As stated before, bubble graphs can be used with multiple sets of data, so this type of graph can be used to convey the idea.

Then we agree, Graphing, as well as Beauty, is in the eye of the beholder?