Python Geocoding Help

Yahoo recently released a nifty geocoder API that's free for small (<50,000 lookups per day), non-commercial applications. Rasmus Lerdorf (Yahoo's PHP king) has written a nice introduction to using this geocoder in your PHP apps. In that spirit, here's a cheap and cheerful Python class that we use to geocode addresses.

from xml.dom.minidom import parse 
import urllib  

class Geocoder:
    """ 
    look up an location using the Yahoo geocoding api
    Requires a Yahoo appid which can be obtained at:
    http://developer.yahoo.net/faq/index.html#appid
    Documentation for the Yahoo geocoding api can be found at:
    http://developer.yahoo.net/maps/rest/V1/geocode.html
    """      

def init(self, appid, address_str):
    self.addressstr = addressstr         
    self.addresses = []
    self.resultcount = 0         
    parms = {'appid': appid, 'location': addressstr}

    try:
        url = 'http://api.local.yahoo.com/MapsService/V1/geocode?'+urllib.urlencode(parms)
        # parse the xml contents of the url into a dom
        dom = parse(urllib.urlopen(url))
        results = dom.getElementsByTagName('Result')
        self.result_count = len(results)
        for result in results:
            d = {'precision': result.getAttribute('precision'),
                'warning': result.getAttribute('warning')}

        for itm in result.childNodes:
            # if precision is zip, Address childNode will not exist

        if itm.childNodes:
            d[itm.nodeName] = itm.childNodes[0].data                     
        else:
            d[itm.nodeName] = ''                
        self.addresses.append(d)
    except:
        raise "GeocoderError"      

def repr(self):
    s = "Original address:n%snn"%self.addressstr         
    s += "%d match(s) found:nn"%self.resultcount         
    for addr in self.addresses:
        s += """Match precision: %(precision)s
            Location: (%(Latitude)s,%(Longitude)s)
            %(Address)s
            %(City)s, %(State)s %(Zip)s
        """ % addr         
    return s

if name == "__main__": sample_addresses = ['555 Grove St. Herndon,VA 20170', '1234 Greeley blvd, springfeld, va, 22152', '50009'] for addr in sample_addresses: g = Geocoder('YahooDemo', addr) print '-'*80
print g

All you need to use this is a Yahoo application id.

You now have four different ways to geocode your company's vital address. If you have suggestions or improvements, let us know. This code is public domain.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






Go with the flow in data display

We spent the last couple of days working with a client on displaying data for real-time dashboards. It got me to thinking: Are there an implicit assumptions and mental habits that people bring to data interpretation? And if so—are there some basic practices to consider for visualizing data?

Which isn't to say this is a right and perfect way to display any particular data; there is room both for creativity and structure. (Check out Information Aesthetics for examples of creative data visualization.) But in the world of management communication, it can't hurt to be aware of your audiences' ingrained assumptions. You want the smoothest path to your important points. The risk is in missing your tiny window to focus a frazzed executive's mind on your point--and finding your carefully constructed analysis get sidetracked.

Here's a starter list of these embedded assumptions:

1. Axes are often the last thing people look at in a chart. BCG Growth ShareThey expect time to progress from right to left and linear scales that start at zero. If two charts are adjacent, they will probably assume the axes and scales are the same. When it comes to the famous two-by-two consulting matrix, good things happen in the upper-right; bad things are in the lower-left. That said, I'm mystified that the famous BCG growth/share matrix's insists on rejecting my new rule.

2. Fluff. Dressing up your display implies you aren't comfortable with the data's ability to stand on its own or you don't have much to say. This can include clip art, data incorporated into pictures, and animation. USA Today is particularly good at this. Check out a couple of examples from their Snapshots section. They have less than three numbers to communicate, but fill it up with eye-catching graphics.

USA Today Snapshot 1USA Today Snapshot 2

3. Point of focus. Most data displays have a clear point of focus for the viewer, whether the presenter intends it or not. It could be the peak in a line chart, values crossing over zero, or a sudden change in values. In a chart like this (below), your intention may be to highlight the general growth trend -- but you can't avoid the inevitable questions about the drop after 2000. You can short-circuit these off-the-topic questions with an explanatory footnote or annotation. Ask yourself: what is the main point I want the reader to get, and what else will my data presentation imply?

Example graph

4. Proximity and size. Placing information close together suggests a connection. Sometimes accidental proximity can cause confusion. You might present two unrelated phenomena next to each other and the audience will automatically try to draw a connection (e.g. dogs have big teeth; teeth are good for crunchiing carrots. Audience thinks: dogs must like to crunch carrots). I just ran across Live Plasma, a great site that lets you enter a musical artist (or band, movie, director, or actor) then shows you related artists. The designers of this data visualization do a great job of building on our data display expectations by using size and proximity to show related artists.

Neil Young map

3 comments


February 27, 2006
Robbin Steif said:

Live Plasma looks cool but is not intuitive enough. You point out that the axes are the last-looked-at, but here I found myself desperately searching for a legend to understand if size or proximity or color matters.

Robbin Steif
<a href="http://www.lunametrics.com/?source=blog&amp;segment=other" rel="nofollow">LunaMetrics</a>


February 27, 2006
Zach said:

Good point, Robbin. It is hard to find the meanings for the size, color, and proximity on the site.


March 9, 2006
Mary said:

Speaking of mental habits, as you were at the beginning of this article, I am wondering if you have spent any time reading about Art Costa's Habits of Mind ideas. It is, of course, education, not business, oriented.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





(Re) Introducing Absolutely Google Earth

A while back we released a collection of tools and resources for Google Earth. We've restructured the page a bit and added a few new links. Check out the new version and make sure to let us know if you have anything to add.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






Scaring Your Users

With the release of screenshots of Microsoft Office 12, I started thinking about how rapid of a change it was from their current interface design. Aren't they worried about scaring away the users that are so comfortable in the new design? Probably not:

  • Microsoft Office has no real competitors and isn't too worried (yet) about losing users.
  • Microsoft has to justify to their users that spending a few hundred dollars on an upgrade is worth it. The best way to do that, is to make it look very different
  • New Interface = Users that need new training = $$$ for Microsoft

So Microsoft isn't in trouble then. But not all products have that luxury. Being a recent graduate, I was not immune from the poker virus that hit college campuses. Every once in a while I play online at PartyPoker. The other day I logged in, approved the mandatory software upgrade, and fired it up ready to play. When I opened it up, I almost gasped.

Old Interface

Old Image

New Interface

New Interface

I was intimidated. This doesn't work for Party Poker:

  • Poker sites have a lot of competition and high turnover
  • Most poker sites pretty much have the same features, games, and functionality.
  • Users that need more training = Users that switch to another site

Every week, poker sites have promotional bonuses to try and drive people from one site to another. The only thing keeping users from switching is that they are comfortable with a sites, look and feel. If you take that away from them, you're making it easier for them to switch to a site with a nice promotion. PartyPoker would have been a lot better off gradually adding in their new features and making sure that their users absorbed the changes as they came.

ESPN is a great example of user interface understanding. They constantly are adding new features (like streaming video) and changing the look and feel of their site, but in a controlled, conservative way.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






Illustrating Imprecision with Excel

A few days ago Zach made a nice point about Zillow. It's oh-so-easy to produce numbers that are precise but are not accurate. Here's a quick screencast to show you one fun way to draw the distinction in Excel using number formatting.

Click picture to view video.

Note: In the screencast, I say precision when I mean to say accuracy no fewer than *four* times. Sorry.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






Video of Excel 12 Business Intelligence Inaction

Here's video of the new analytics capabilities coming in Excel 12, including the revisions to PivotTables. Microsoft is pushing hard to weave Excel, SQL Server, and Sharepoint into an integrated system.

It's early, but I'm concerned that analysts will have to know even more to get useful work done. Analysts would benefit from PivotTables that are easier to use rather than PivotTables that require knowledge of SQL Server, SharePoint, Unified Data Models, etc.

If you're an analyst, check out the video and let us know if the new Excel approach would work in your organization. The video is 50 minutes long. Jump to 9 minutes in if you want to get past the intro chitter-chatter.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






What is analytics?

A reader wrote to us today:

I seem to have spent the last few days (not including the week-end I must add) trying to get to grips with 'Analytics'. If [my boss] comes in wanting a 5 word anaswer to his question "what exactly is an analytic?" I think I'd still be at a loss as to how to define it.

It's a great question. Analytics (along with its sister/twin term Business Intelligence) gets thrown around without much clarity as to its meaning. You might think with the word in our name, that we'd have long ago nailed down a definition. Not so. (Although we do have a good understanding of what "Juice" means?)

Below is my take on a "map" of the analytics world.

Map of analytics

I used a couple of dimensions to help frame all the parts and pieces:

  • Purpose. A concept of "exploration vs. control" highlights the difference between analysis and reporting. Analysis is about digging deep into data to discover relationships, find causation, and describe phenomena. Reporting, in contrast, is used to track performance and identify variation from goals.

  • Timing. Most analytics is backward looking -- in an attempt to understand what has happened, and therefore be equipped to make better decisions in the future. Alternatively, analytics can focus explicitly on predicting future performance or, in the a few cases, provide information to support decisions in real-time.

I'd really appreciate any comments on this map -- whether I've missed/misgrouped/misrepresented concepts or alternative dimensions to describe the space. The more clarity we can provide in describing "what is analytics" the more palatable the concept will be.

14 comments | Show all comments only the last 5 are shown


September 10, 2006
sudharshan sundarrajan said:

A pretty good diagram. I would like to add a new dimension(or maybe an implied one!) to the purpose. We normally classify analytics into 'Market analytics' and 'Risk analytics' in our organisation. Intelligent 'Market analytics' aids brilliantly in marketing and pro-active customer care. 'Risk analytics' deals with identifying potential risks, their 'riskiness' over a period of time, risk mitigation strategies and their effectivess etc. 'Risk anlaytics' is slowly moving a lot of business decisions in a lot of organisations from being affected by judgemental bias.


January 22, 2007
Mohan said:

I am not against analysis as a tool but there is far too much of analysis thinking that it will solve all buisness problems. Many a managers feel that real life business needs "Synthesis" more than analysis. All the parameteres of buisness environment can't be quantified and many important ones are soft ones or intangibles difficult to quantify. I prescribe more to Alexander Christopher's philosophy where more important than analysis is synthesis of which un-fortunately there is very little talk and even lesser training of managers. Our Management Institute has gone to the extent of even introducing a full fledged MBA i.e. Masters in Business Analysis. I am afraid too much of analysis may lead to paralysis. In the end no mathematical model can replace human decision making for which as yet no effective replacement has been found.


January 24, 2007
Deven said:

Hi Mohan,
Masters in Business Analysis sounds interesting. Can you please share more details of your Management institute?


February 1, 2007
Harry said:

Would you consider Predictive Analytics to cover any of the "risk analytics" that Sudharshan is talking about? Does it cover more than just the market side ?


March 5, 2007
Sateesh tadur said:

going by the terminology used in the Business analytics are there any statistical techniques thar used in the commercial context. I would like to know specific multivariate techniques applied in this area.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Zillow's challenge: precision implies accuracy

Zillow released its home value assessment tool recently. It is a tantalizing concept: they claim to have put a dollar value on over 40 million homes across the country. I rushed to the site and was satisfied with the results for my house. Then I was overjoyed to find that the new bathroom we are adding in the basement will increase our home value by $85,000. Nice! Better yet, I found that if I just add five more bathrooms, I can double the value of my house. I guess buyers would agree with me: it is nice to have a bathroom nearby when you need it.

Numbers like these have made some people suspicious. A recent article in the Washington Post criticized Zillow for its inaccuracies:

Offering automated property valuations via the Internet turns out to be much harder than it seems -- especially if you expect them to be accurate. But after running extensive tests on this ambitious national real estate service, I found it to be so inaccurate that it's not useful.

The founder, Lloyd Frink, fully acknowledges the problems, but believes more information is better. It can only help, he argues, to give people more information in the confusing home buying or selling process.

Here's the problem (one I've run into many times in the world of analytics): if you present something with precision, your audience will believe your numbers are accurate. Particularly if you are backing it up with language like:

We compute this figure by taking zillions of data points — much of this data is public — and entering them into a formula...[it] is incredibly robust and sophisticated...Hundreds of home details feed into the formula and the home characteristics are given different weights according to their influence in a given geography and over a specific period of time.

There is a related phenomenon in software development -- The Iceberg Secret -- described by Joel Spolsky:

If you show a nonprogrammer a screen which has a user interface which is 100% beautiful, they will think the program is almost done.

If the front end looks nice, most people assume everything behind the scenes works well.

I feel for the statisticians at Zillow. Creating a database with a majority of home values within 10 or 20% of reality is a monumental task. Unfortunately, even that isn't good enough. It doesn't take many wildly inaccurate estimates to undermine the credibility of the whole tool.

I'm reminded of a story passed around in the consulting business: Imagine sitting down in your seat on a flight and noticing that the seat belt sign above your head doesn't work. The fact that some little light isn't working doesn't imply there is anything wrong with the airplane's engines, navigation system or anything that truly could impact your likelihood of arriving at your destination. But that little failure can make you nervous.

9 comments | Show all comments only the last 5 are shown