Recreating the NY Times Cancer Graph

This New York Times cancer graph is a beautiful piece of work.

NY Times cancer graphic

I wanted to see if we could reproduce it with everyday tools.

Excel reproduction of the NY Times cancer graphic

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!

Derek raised an interesting question about how to find the fonts used by the New York Times. While I don't think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here's a link or google for "microsoft office compatibility pack". I recommend using these fonts.

Here's a version of the graph with these new fonts and more emphasis on getting the typography right.

Excel reproduction of the NY Times cancer graphic with better fonts

35 comments | Show all comments only the last 5 are shown


November 14, 2007
Javaun said:

Hi Derek. I too use Excel 2003, and so I guess I don't have the bar transparency feature that Shawn proposed to make the gridline appear to float over the bar. Still, Shawn's idea would work to make the gridline float over the bar but appear transparent on the background. He simply needs to change the dotted gridline color to white. The white will show briefly through the transparency (may appear off-white) but will be indistinguishable against the backround. I'm guessing that for the NY Times graph, they did a rough mockup in excel using ugly colors and ugly fonts, and then a designer traced it (to preserve the scale) in Illustrator and beautified it with color and fonts.


December 27, 2007
sesha said:

Great work. Keep posting to benefit many like me.
Can you also help me in constructing graphs on a mckinsey chart that we use at our office. My problem is to edit the text boxes and graphs every time i need to update the data


December 27, 2007
Zach said:

Sesha, we have developed an approach for automatically updating PowerPoint slides (charts, text boxes, tables) from Excel spreadsheets. I'm not sure if that is exactly what you are referring to. We can discuss offline if it is.


January 7, 2008
Sarah said:

I created a similar graph using Jon Peltier's tornado graph as a starting point. I was able to get white gridlines on top of the bars by creating a dummy series and then adding y-error bars. I had the additional requirement of getting the Male and Female sides into a single chart, so I had to use a dummy series for the y axis anyway. Here is what it looks like: http://flickr.com/photos/saamiam/2176279190/


June 3, 2008
brandie said:

my father died of lung cancer...hahahha jking

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Analytics Roundup: Square Pie of Death

NY Times: % of Americans who believe that after death...
Astonishingly awful square pie from the NYT, who are normally infographic innovators.

raganwald: Beware of the Turing Tar-Pit
Know the difference between general and specific in building tools.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






Square Pie in the Eye

The New York Times—normally a source of clear and interesting infographics—produced the following graphic over the weekend.

NY Times square pie graphic

This is bafflingly awful—it’s Tiger Woods carding a 90. Square pies are an infographic seasoning—they’re cilantro, not steak. Here are a few of the problems with this graphic:

The color choices are bad. The saturations between groups are considerably different. The yellow is highly saturated while the other colors are not. The increased saturation draws your attention to the yellow area, but this is just a category like the others. I’d imagine someone with red-green color blindness would have trouble distinguishing the other colors.

There’s a hole in the center. Presumably this indicates people who didn’t respond to the question, but this is not noted. There are no gridlines in the white section even though the non-responding group should be treated visually like the other groups.

It’s hard to compare the sizes of groups. People are better at comparing lengths than volumes. Mixing length and volumes—some of the of the response categories are arranged linearly, while the inner category is basically a volume (with a hole!)—makes it nearly impossible for people to use their spacial skills to side up the differences. Asking people to compare lines and donuts is like asking whether you prefer the color blue or raw carrots. For the record, I prefer carrots.

If you’re interested in the concept of square pie charts, the place to start is at EagerEyes. If you want to learn how to make them yourself, check out our contest, results, and screencast.

The Times is still a source of great design and inspiration. Here’s another graphic they also produced over the weekend that shows cancer incidence, survival rate, and gender differences in a way that is clear, clean, and concise.

NY Times cancer graphic

7 comments | Show all comments only the last 5 are shown


July 30, 2007
Chris Gemignani said:

Tony, One of the really bad decisions here is how the pie is filled by spiraling in from the outside. Filling the square pie by filling a row of blocks horizontally until you reach the target would have been a little better.


July 30, 2007
Tony Rose said:

Yes, exactly. That spiraling is extremely difficult to follow and adds no value. Using a square pie and shading starting in the lower right for each value would have been better, but would have created five graphs. I think a bar graph would give the readers the same information and would cut down on the comprehension time.

Your post back in December shows an example:
http://www.juiceanalytics.com/writing/2006/12/solving-the-pie/


August 2, 2007
Jordan Lund said:

My initial reaction towards square pie charts, pixel charts, or whatever you want to call them is that they are horrible and unreadable.

However now that I've had time to think about it, they could work provided there were a style manual for setting them up.

For example:

1) Charts must be filled in from left to right from the largest value to the smallest value.

2) Data labels must be placed in the first column which contains a majority of that value.

3) It is allowable to complete a column with a smaller value if this will prevent other columns from being broken un-naturally.

So, taking the NYT data and applying the three rules I just created, you end up with this:

http://img293.imageshack.us/img293/5534/gridchartco5.jpg


August 12, 2007
Sherman Dorn said:

Okay, the square-pie nonsense trumps today's awful bubble-map on attacks in Afghanistan (http://www.nytimes.com/imagepages/2007/08/12/world/20070812_AFGHAN_GRAPHIC.html). Good article, bad infographics, no cookie for Times graphics folks today!


September 9, 2007
Bob said:

It's a visual gag.

It's supposed to look like the light at the end of a tunnel, not make it easier to compare the proportion of 4 categories (which even a seven year old can do, even without the aid of a chart, and certainly without complaining about how difficult it is).

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





TV Ratings and Online Audiences... Or, Where to Find Skeet Ulrich's Bio

The TV ratings system is broken. Everyone knows it, but nobody wants to admit it. Nielsen ratings struggle to accurately measure audience quantity (limited tracking of DVR usage and online viewers) and quality (are viewers engaged? are they skipping the ads?). However, admitting so would undermine the delicate balance TV networks share with their advertisers.

I caught an interesting segment on KCRW's "The Business" podcast about TV series that find themselves on the "bubble," i.e. at risk of getting canceled. The producer of CBS's Jericho, "a post-apocalyptic drama starring Skeet Ulrich" (shouldn't that description alone put it on the chopping block?), explained how they received a temporary stay of execution when their small but loyal audience protested network plans to cancel show. The interview raised questions about the validity of Nielsen ratings and how an fervent online audience can bring additional perspective to the performance of a show.

All this talk of measurement gave me an itch to look at some real data. I tracked down the Nielsen audience size (Subscription required) for TV series over the 2006-2007 TV season. Then I pulled from comScore (a Juice client and leading source for data about Internet traffic and usage behaviors) the unique visitors and time spent on websites of TV shows over the same September to May time period.

I had a few questions I was curious about:

  1. Which shows have dispropotionately larger internet audiences—an indicator of a loyal and rabid fan base? Are there other shows like Jericho that struggle to build a large TV audience, but have a strong online following?
  2. Which TV show sites have the most engaged audiences?
  3. What TV networks have been most successful at building online traffic to their sites? Which types of shows spawn online audiences?

The table below shows the top 20 TV series by ratio of monthly unique website visitors to average TV viewership. This metric suggests an ability to get viewers to look for more content, whether it is additional video, information about the actors, or discussion boards. If Jericho's 9.5 million TV viewers (tied for 48th overall) represents the proverbial bubble, there are eight other shows with bubble-level ratings that can also claim strong online support (highlighted in this list).

Ratings Table 1

I also wanted to get a sense as to the engagement of the online audience. Were people simply stopping by the website to check the TV schedule, or were they digging deep for more content? One measure that gets at this question is minutes per unique visitor. The top 20 websites are listed below. Interestingly, 12 of these sites are also found in the previous table. Jericho is one of four of the bad-Nielsen-ratings/strong-online-audience group that overlap with the table above. (NBC, if you are grousing about ratings for The Office, hopefully these numbers will make you feel a little better.)

Ratings Table 2

The final table addresses my third question about the TV networks and types of shows that are best at building an online audience. ABC has done more than twice as well as CBS in getting viewers online, which may be a reflection of the traditionally older CBS audience. Note: I pulled the top-end outliers (American Idol, You Think You Can Dance?, and Deal or No Deal) from the Network comparison.

The second half of the table brings those TV series back into the mix in the reality/contest category, and you can see the impact. I was surprised at the dearth of sitcoms on this list. It may be that a website for a sitcom doesn't typically make sense.

Ratings Table 3

With all the money spent on TV advertising, I can only hope the networks go beyond the top-line Nielsen ratings to try to get a complete picture of their audiences.

15 comments | Show all comments only the last 5 are shown


July 28, 2007
Hadley Wickham said:

In the first table, the second column is labelled "Website audience / TV audience", but the values in the columns are percents. This doesn't make sense to me- does 5.5% mean there were 5.5 times as many web viewers as tv viewers, or only 5% of the number of tv viewers used were website viewers? It's a big difference!

A scatterplot of web audience vs tv audience would also be useful, especially if supplemented with some reference lines (eg. 2x 5x 10x)


August 2, 2007
Paul Robinson said:

Just out of curiosity, why did you ignore Deal or No Deal in your conclusions? It has by *far* the biggest gap between Nielsen and website audience and it has the longest avg visit time online - yet you don't refer to it once.

I also agree with Hadley - you've spent time putting this stuff together, which is great, but you've not explained what the figures actually mean. Tufte would be ashamed of you! :-)


August 2, 2007
Zach said:

Hadley, You are correct in pointing out that I incorrectly used percentages when it isn't truly a percentage. The metric is intended to show the size of the online audience relative to the TV audience -- but it isn't as if one is truly a percentage of the other. 5.5% represents the ratio of one audience to the other (as shown in the column header). I find it a stretch to interpret 5.5% as 5.5x.
Paul, Good observation. I had suspected that "contest shows" like Deal or No Deal or American Idol drive traffic to their site by getting people to vote online or play an online version of the game (or look at photo galleries of the Deal models in skimpy dresses). In that sense, I was more interested in talking about shows that seemed to be creating loyal audiences through the characters and content of the show.


August 26, 2007
Jennifer Reed said:

I was a Nielson TV home. The amount of equipment that had to be placed in and on all my tvs, vcrs, video games etc. sucked. But overall it was kind of cool. Shows like House, Dateline NBC, and the entire cartoon network were watched. I have a large family and we made sure we watched television of substance not like the crap with Paris Hilton. It is kind of cool to feel you have a say in whats good tv. I did this for a few years until I moved. There was no money paid to participate except $30.00 every six months to cover the electric all the annoying equipment used. Furthermore, they wanted us to be very secretive and completely accurate in what we watched, advising us not to use the tv for company noise, etc.. Nielson, to me is very competent in how they research who watches what . They once even called me because the tv was on for several hours on the same channel and wanted to know why. It was because the kids were sick and watched cartoon network all that day. Please, I would not doubt Nielson, they are going to be the most accurate you could get unless you monitored every home in the entire world.


August 26, 2007
Zach said:

Jennifer, Thanks for sharing the details of the Nielsen family experience. I've always wondered what exactly was involved. My concern isn't whether they do what they set out to do well...it is that they don't attempt to capture the full picture. With DVRs/TiVos and online viewing, the outside-the-living-room picture is becoming increasingly relevant.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





What is Worse Than a "Super Mugging"?

I don't know what you call it, but I know it when I see it. A couple months back I wrote about IBM's sweet $80 million contract to develop ARIS (Achievement Reporting and Innovation System) for the New York City public schools. At the time I used some harsh words to describe this fleecing: swindle...preying on clients' lack of expertise...Dr. Evil...wasted time and effort.

News comes to me from Leonie Haimson, Executive Director of Class Size Matters, that the $80 million price tag is, well, a starting point. She pointed me to a recent article that describes the creeping costs:

The education department's new $80 million student-tracking computer system just got more expensive - and some parents are questioning whether that's the best use of the money.

To ensure that children's test scores and other private data don't get into the wrong hands, the city began accepting bids this week from companies that specialize in safeguarding information, which experts say could add several million dollars to the system's price.

"What's not lost on parents of kids in overcrowded schools is that with the money being spent on this, we could build and staff several more schools," said Tim Johnson, president of the Chancellor's Parent Advisory Council.

Parents are also wondering whether the system's mounting cost is worth it - and why education officials didn't anticipate the extra cost sooner. —New York Daily News

It does seem odd that a $80 million system wouldn't come pretty well stocked with security, particularly from a blue-chip vendor like IBM. On top of that, Leonie hints at other costs that aren't being directly counted toward the implementation of this system:

This initiative has mushroomed into a huge expense that threatens to overwhelm the entire school system, with all the SAFS, data inquiry teams, tests, and even the community district superintendents gobbled up to interpret and try to "coach" schools in the use of the massive data that will be spewed out. The DOE wants to charge much of this to the "contracts for excellence" and our CFE dividend, though it’s a real stretch to see if any of this falls under the specific programs outlined by the state.

Good luck to Leonie, Patrick Sullivan and the others who are stepping up to question this white elephant project.

5 comments


July 8, 2007
yoshi said:

I won't comment on the contract itself except to say that public school's are generally unsophisticated IT users and do naive things like issue press releases on all the wonderful things a system will do before a single line of code is written.

But to the security question. IBM has purchased several security vendors (most notably ISS) and has always had on staff many excellent IT security folks. However - these folks are never involved in bidding, designing, or developing of systems for clients. Or if they are - its usually not a very involved relationship. Its of no surprise that during the design or development of whatever system they are putting in here that the question of data integrity and access has been raised. It could be a new state law or an audit that is pushing the issue. Or perhaps someone getting a clue (unlikely). But knowledgeable security practitioners are rarely involved at the beginning - which is where they are needed most.


July 9, 2007
derek said:

It certainly sounds as if IBM have pulled off the <a href="http://www.juiceanalytics.com/writing/2006/12/consulting-and-rice-krispie-treats/">Rice Krispie trick</a> on a grand scale.


July 9, 2007
Zach said:

Now that is a loyal reader!


July 9, 2007
derek said:

Surely you mean "now that's what I call combating recency bias!"? :-)


July 11, 2007
Rob said:

Congrats on your 100th post!

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the sp