Tufte-Style Comparison Chart Generator

Last week, we shared a rendition of a Tufte graphic using just a few lines of Nodebox code. As our commenters pointed out, Python is great, but it may not be every business analyst's carnal desire to learn a programming language just to generate some nifty graphs. I spent some time to push Chris's Nodebox rendition into a PIL-based Windows tool that can generate the same sort of comparison graph from an Excel file on the fly.

The result is The Comparison Chart Generator 1.0. The installation instructions are relatively simple. Unzip the zip file, and run comparisionchartgenerator.exe.

Alternatively, we have a new excel chart that creates the same effect using only excel functionality. Download the Excel Tufte Line Chart here.

If you are using the Chart Generator, start with some data in an Excel (xls) or Comma Delimited (csv) format. The data for this graph has to be contained within the first sheet starting with cell A1, as in the following picture.

Excel Dialog

Select an input file. There are a couple example files bundled with the download.

Open File Dialog

After selecting a file, you'll be prompted to modify a few of the basic options available for the chart.

Options Dialog

Finally, save the result as a jpeg.

Save File Dialog

Here is the same image found in Tufte's textbook processed using the Comparison Chart Generator. It is generated using the csv example file bundled with the download.

Tufte-esque Chart by Comparison Chart Generator

Those of us who have undergone lasik eye-improvement surgery may still prefer the sharp crisp Nodebox results, but for the rest of us, this image looks pretty good. Let us know if this tool is useful. If there is enough of a positive response, we may consider expanding functionality for other fancy Tufte-esque charts.

If you do prefer Nodebox, I have an updated script here. This pushes the script up to 20 lines of code or so, but the extra 9 lines allow the labels to push themselves apart on their own. If you want to look at the source code for the Windows program, you can get it here. I used py2exe to compile it into an executable. The code, however, has not been thoroughly commented or cleaned as of yet, so edit it at your own risk.

18 comments | Show all comments only the last 5 are shown


May 13, 2008
Kasper said:

Great tool. One question: Is there a way to change the number of decimals shown? Currently it seems to show just on decimal, whatever the number format in the xls-spreadsheet.


May 14, 2008
Sal said:

As promised, I posted an excel chart of the same graph. You can find the link near the top of the page.


May 15, 2008
Jose Hernandez said:

I have an alternative post on a dynamic Excel bumpchart that combines charts with the cell grid. You can donwload it at http://sites.google.com/a/visual-catalyst.com/info_displays/Home/tufte_example_bumpchart.xls?attredirects=0

This display works for all versions of Excel. I'm working on a how to that describes how you can extend this type of chart.


May 31, 2008
Christof said:

Excellent work. I'm impressed!


July 2, 2008
John said:

awesome - using it right now. More Tufte style charting programs please!

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Real-World Tufte Graphics in 11 Lines of Code

Check out our followup post that describes how we created a downloadable Windows application or an excel spreadsheet you can use to create these graphics.

One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.

Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.

Raise your hand if you have a graphic design assistant at your beck and call. I thought not.

One of the tools we use for rapid prototyping at Juice is NodeBox.

NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.

All true. But it's more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world's easiest programming language. Oops, here's the right link.

I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Tufte Current Receipts Graphic

Here's the code. It's 11 lines of code if you exclude entering the data and setting things like fonts and colors.

size(500,700)
font('Palatino'); 
fontsize(12)  
stroke(0.4)  # a medium grey for lines
fill(0.2)    # a slightly darker grey for text  

<h1>data = (label, first, last, label-fudge-factor)</h1>

data = [ ('Sweden', 46.9, 57.4, 0., 0.),
         ('Netherlands', 44.0, 55.8, .3, 0.),
         ('Norway', 43.5, 52.2, 0., 0.),
         ('Britain', 40.7, 39.0, 0., 0.),
         ('France', 39.0, 43.4, 0., 0.6),
         ('Germany', 37.5, 42.9, 0., -0.4),
         ('Belgium', 35.2, 43.2, 0., 0.),
         ('Canada', 35.2, 35.8, .8, 0.4),
         ('Finland', 34.9, 38.2, -0.5, 0.),
         ('Italy', 30.4, 35.7, 0.3, -0.3),
         ('United States', 30.3, 32.5, -0.3, 0.),
         ('Greece', 26.8, 30.6, 0.4, 0.),
         ('Switzerland', 26.5, 33.2, -0.2, 0.1),
         ('Spain', 22.5, 27.1, 0., 0.3),
         ('Japan', 20.7, 26.6, 0., 0.), ]

text("Current Receipts of Goverment as a Percentage of "
      "Gross Domestic Product, 1970 and 1979", 20, 70, width=215)
text("1970", WIDTH*.28, HEIGHT*0.03)
text("1979", WIDTH*.68, HEIGHT*0.03)

def ypos(val):
    # calculate a vertical position by scaling between 10% and 90% 
    # of the height of the image
    return HEIGHT * (0.9 - 0.8 * (val - minval) / (maxval - minval))

<h1>find the minimum and maximum values in the range</h1>

alldata = [d[1] for d in data] + [d[2] for d in data]
minval, maxval = min(alldata), max(alldata)

for label, start, end, startfudge, endfudge in data:
    align(RIGHT)
    text(label, 0, ypos(start+startfudge)+4, width=0.25*WIDTH)
    text("%0.1f" % start, 0.25*WIDTH, ypos(start+startfudge)+4, width=0.07*WIDTH)
    align(LEFT)
    text(label, WIDTH*.75, ypos(end+endfudge)+4)
    text("%0.1f" % end, 0.68*WIDTH, ypos(end+endfudge)+4, width=0.07*WIDTH)
    line(WIDTH*.33, ypos(start), WIDTH*.67, ypos(end))

Here's what the result looks like.

Tufte Current Receipts Graphic with NodeBox

We have some great followups to this planned for next week. We'll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.

Check out our followup post that describes how we created a downloadable Windows application you can use to create these graphics.

21 comments | Show all comments only the last 5 are shown


May 16, 2008
Chris Gemignani said:

Who's up for a multi-language infographics shootout?


May 18, 2008
Tim said:

That's cool !

I was wondering if there was a way to generate these graphics through command line ? that way we could embed this in web application and get the graphics generated dynamically

note: looks like comments in your code got converted to html (# -> h1)


May 18, 2008
Kragen Javier Sitaker said:

Is there a way to get old-style numerals with NodeBox? I suppose you have to find an installed font on your Mac with old-style numerals.

Pradeep's processing.js demo is awesome, but from the screenshot lacks antialiasing. (I'm not yet a Firefox 3 Achiever.)


May 19, 2008
Luke said:

Dude, why reproduce the errors ("fudge factors") in the original?


May 26, 2008
The Dude said:

@Luke: Dude, the fudge factors are not errors. They are there so that the text labels do not overlap.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





New Year’s Resolution: Tufte and the iPhone

Edward Tufte has produced a illuminating video tour of the user interface of the iPhone. The video illustrates Tufte’s struggles to come to grips with the difference between dynamic screen resolution and the resolution of printed paper. Tufte is prone to grandiose pronouncements, like this one:

All history of improvements in human communication is written in terms of improvements in resolution: to produce, for viewers of evidence, more bits per unit time, and more bits per unit area. Slideware is contrary to that history. Trading in reductions in resolution for user convenience or for pitching may be useful in mass market products or in commercial art, but not for technical communications. The solution is not to rescue slideware design; the solution is to use a different, better, and content-driven presentation method. On this solution, see our thread PowerPoint Does Rocket Science—and Better Techniques for Technical Reports — Tufte Nov 10 2006

Somehow, I don’t think the importance of the Gutenberg Bible related to it showing “more bits per unit area.” Quick, count the “bits per unit area.”

Gutenberg bible courtesy of Wikipedia

Illustrated bible courtesy of Wikipedia

It didn’t take bits per unit area to revolutionize communication in the past and it won’t in the future either. The iPhone is a tremendously engaging information device and points the way forward for information displays. Here’s what the iPhone does well:

Maximize screen real estate: Controls are only visible when needed, fading away gently when you are concentrating on content. Tufte furiously neologizes, calling this “computer information debris.” Control junk is more apt, more terse, more Tuftian.

Direct manipulation: As Tufte says: information is the interface. Filtering and choosing should take place in the context of direct manipulation. A good essay on the possibilities of direct manipulation can be found here.

Fun: Above all, information can be fun and engaging to navigate. Tufte condemns Apple’s stock ticker for having “cartoony” and PowerPoint-like displays and offers an improved version (with 5 digits of precision). Apple’s cheery display offers a more entertaining, usable interface for day-to-day usage.

With our empathy for the day-to-day troubles of the business person seeking insight in data, it’s frustrating listening to Tufte. He is clearly an academic, with academic interests and academic timeframes. As much as his work is respected and inspirational within business circles, he makes little effort to enable his message to be implemented.

Good Tufte: Clutter and overload are not an attribute of information, they are failures of design. If the information is in chaos, don’t start throwing out information, instead fix the design.

Bad Tufte: “…the conclusion of sparkline analysis in Beautiful Evidence, where the idea is to make our data graphics at least operate at the resolution of good typography (say 2400 dpi).” http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0002NC&topicid=1 *Ed: At least 2400 dpi? Orly?

Mostly right Tufte: “Thus the iPhone got it mostly right.”

Mostly wrong Tufte: “Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.” http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0000Jr&topicid=1&topic=Ask%20E%2eT%2e

It is heartening to see Tufte engage and connect his mental frameworks to our modern, screen-oriented, graphics-accelerated, not graphics-designed world. But the future of information design and interaction belongs to the iPhone, not the printed page.

3 comments


January 25, 2008
ross said:

Nice post, thanks for making it, I found in interesting and I think it's good that people are prepared to quest Tufte, who seems to have rightly or wrongly some God like stature.
For my part, I have used TyTN's series since mark 1 and these, running windows mobile, have had all of the features (more or less) of the iPhone for some time. Compromise in the key with small devices. - Untill we get screen that can project into air! :-)
Cheers
Ross


January 25, 2008
mahalie said:

It's always folly to never question anything someone says just because you have a lot of respect for their ideas generally. Yet I see many bloggers flame well-respected experts...probably as traffic bait. So great to hear a voice of reason. Thanks!


January 29, 2008
darrell said:

"To clarify add Detail" - as an example, Tufte adds a satellite weather pattern to augment a weather forecast of X degrees and Partly Cloudy. How does that clarify? You need expertise to interpret it, and it didn't offer analysis / interpretation just raw data (satellite view).

I understand his point if you're presenting to a panel of experts. But the iPhone is sold to consumers, not weather forecasters.

Few of us are weather forecasting expertise (beyond idle speculation). Using the satellite video, a non-expert could probably guess, the degree of cloudy, and perhaps the direction of the wind. Other useful info like wind speeds, wind chill factor, probability of precipitation and temperature are not aided by the satellite visual.

Eye candy; yes. Useful; only to a limited expert audience, and only with additional information not displayed.

"To Clarify; first consider the audience, then add relevant detail."

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Excel 2007 and the Lie Factor

“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”

Edward Tufte calls violation of this principle the “Lie Factor”. The implementation of in-cell data bars in Microsoft Excel 2007 is a big offender.

Almost a year ago, I was surprised to discover that the Microsoft Excel 2007 development team didn’t understand what zero means. Their implementation of in-cell data bars showed a bar in a cell, even if the cell had a zero or very low value.

Data bars in the Excel 2007 prototype

That was in the Excel 2007 Beta. Things haven’t improved in the current version of Excel 2007. The default setting for data bars in Excel 2007 is to scale to bars so that the smallest bar is based on the smallest value in the selected range and the largest bar is based on the largest value. It still appears that the smallest bar will be no smaller than five or ten percent of the width of the cell. Here’s a sample:

Sample data bars in Excel 2007

So, if you select a range that has values between 600 and 700, the 600 would have a little bitty bar and the 700 would have a full-width bar. Based on the bars, it would look like the 700 is ten to twenty times larger than 600. Outside of Redmond, this is generally regarded as untrue.

What’s more, if you create two sets of data bars side by side, each group of data bars scales itself independently even though they look the same. Take a look at this screenshot:

Sample data bars from two different conditional formats in Excel 2007

Notice the top seven cells have data bars that have one set of scaling and the bottom data bars have a different scaling. However, they look identical, and users should generally expect these bars to have the same scale.

Here are the rules:

  1. Defaults matter! It doesn’t matter that you can do data bars correctly in Excel. The default should be to do it right and it should be hard to do it wrong.
  2. The “right way” to make data bars is to make the length of the data bar directly proportional to the value in the cell. If one cell has a value twice another it should have a bar that is twice as long.
  3. Remove the default gradient shading. The gradient makes it hard to tell where the bar ends, obscuring what you’re trying to show.
  4. Continuous cells with data bars should all use the same scale. Use different colors to indicate ranges that have different scales.

Excel 2007 supports at least twenty-five different combinations of ways of specifying the length of the data bar.

Five different ways of setting data bars

Exactly one of those ways is correct. Base the shortest bar on the number 0. Base the longest bar on the highest value. Turn off the gradient. If you want to see bars based off percentile or some custom formula, then be explicit. Create a new column, create your formula, create bars on that column.

Please, guys, this isn’t rocket science. This is plain common sense. You would not ship Microsoft Word with a glaring bug in the way text renders. You would not ship Excel with a broken statistical function that people use everyday. Delivering deceitful-by-design infographics betrays your central role in democratizing the analysis of data. Until you fix this, in-cell ASCII art still remains the best way to explore data visually.

A disclosure: We do not currently use Excel 2007 at Juice Analytics. This is not due to a high-minded sense of moral outrage but is merely a reflection of our clients' environments.

7 comments | Show all comments only the last 5 are shown


June 13, 2007
Will Oswald said:

"You would not ship Excel with a broken statistical function"... erm, unless you include the LINEST function that up until Excel 2003 did not adjust for collinearity in multiple regressions, a fundamental problem


June 13, 2007
Chris Gemignani said:

You noticed I qualified my statement with "that people use everyday". I have heard about this problem and others in with Excel's statistical functions. These problems should have been fixed as soon as they were reported.


July 10, 2007
R Varley said:

Hi, I'm trying to write an evaluation document on Excel 2007; everyone seems to think it's rubbish for statistics, but no-one says what's wrong. I've been trawling the internet for days, and turned up nothing beyond "Everyone knows it's broken". Can you give me any pointers?

Thanks.


October 11, 2007
Patrick O'Beirne said:

1st Oct 2007:
Data Bars – Feedback Please
Today’s author: Scott Ruble, the program manager who leads the charting and visualization efforts in Excel. Scott is looking for some feedback on potential changes to data bar behaviour.

http://blogs.msdn.com/excel/archive/2007/10/01/data-bars-feedback-please.aspx


October 12, 2007
Chris Gemignani said:

Patrick, I commented on the Excel databars post. I'm astonished that these questions keep coming up. The solution is simple: "You need to start with the absolute principle that the bars you show _must_ be proportional to the numbers they represent."

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment