Another Hot Data Trend — Same Timeless Goal

Every major company is now saying they have Generative AI and are doing AI. The hype battle is long underway, and the marketing avalanche is in flight.

Industry professionals know it’s merely the beginning of something much bigger that will take time to unfold; meanwhile, most of the market thinks it’s just another thing.

🗽 🇺🇸 🌭 Who's the greatest President? Who's greatness is most disputed?

525 historians and political science scholars got together and ranked all the Presidents. It sounds like the ultimate bar game for U.S. history professors. They releases their results recently here

As per usual, the report does no favors for the casual reader who might want to explore this data. So I put together an interactive app (open in a new window)

This is the beauty of Juicebox: if only one of those 500+ academics had asked, I could have given them a 10x better way to present their data before they released their PDF.

Battle of the Chatbots!

Which Chatbot is the best? Check out our interactive app below:

🤖 Which organizations are taking the lead with their Chatbot models?
🤖 How are Chatbots improving over time?
🤖 Which University has created a top 5 Chatbot?

The data is sourced from the Large Model Systems Leaderboard (https://lnkd.in/gvDSKSN9), "a crowdsourced open platform for LLM evals. We've collected over 200,000 human preference votes to rank LLMs with the Elo ranking system."

The Irregular Path of Data Analysis

Change does not happen in straight line. And we do a disservice when we thinking about “data driven decisions” as a simple sequence of events:
gather data —> do analysis —> find insights —> present insights —> action

Let’s take a few examples from outside the world of analytics:

1. In 1969, a community of Native Americans protested on the island of Alcatraz in the San Francisco bay. For 19 months, they occupied the island, demanding the return of the land. In the end, the protest fizzled and their demands were reject. However, their efforts were not ultimately without change. In the following years, President Nixon signed a series of bill to give back millions of acres of land to Native Americans and provide support for their communities.

2. Marketing professionals have had to embraced the messy, complex reality of multi-channel and multi-touch marketing. It recognizes that purchasing decisions aren’t a one-and-done conversion event. In fact, it can take 8 or 15 touches of a consumer to get to a purchase decision. That makes marketing more like a series of nudges than a single convincing argument.
Action comes about through a circuitous route.

Analytics professionals need to internalize this same lesson. It can change how you think about your role:
* Persistence in sharing your message > Perfection of message
* Many insightful nudges > A single comprehensive presentation
* Building relationships with your audience > Unassailable logic

Data Storytelling 2.0

I've been writing about data storytelling for a decade. The concept has grown in popularity; the underlying concepts haven't changed much.

Most courses or books will emphasize the same core concepts: focus on your audience, set up the conflict, lead your reader to resolution and action, use visualizations to deliver your messages.

These are good things if you want to convey a message with data. But if we were to put Data Storytelling on the Gartner hype curve, it would sit somewhere beyond the "Peak of Inflated Expectations" and far short of the "Plateau of Productivity." People love Data Storytelling as a concept. They struggle to make it useful in their everyday work-life.

I think it is time to reconsider and reframe Data Storytelling to make it a useful tool in our modern workplace. A few examples:

Data Storytelling v1.0 --> 🆕 Data Storytelling 2.0

One-directional presentation to an attentive audience --> 🆕 Bi-directional dialogue to an attention-starved audience

The capstone to an analytics journey --> 🆕 A set of techniques used at every stage of the journey

Visuals and language will carry the message --> 🆕 Delivering to an audience must consider Where, When, Channels, Formats.

Comprehensive narratives --> 🆕 Insights as the essential unit of communication

Target your audience --> 🆕 ...and the people your audience will share it with

Data storytellers need a collection of skills --> 🆕 Specific data storytelling skills can be applied selectively by many people

How to Summarize Data using ChatGPT

We know that ChatGPT is remarkable at generating text. It is also a powerful tool for summarizing text. It can compress a long article down to the CliffsNotes version in an instant.

How does it do with data? With some prompting guidance, I was able to teach ChatGPT an approach for summarizing a data table. Understanding what you are working with in data is often the first step before diving into analysis. I was impressed with the results once I walked ChatGPT through my general thought process.

I started with this prompt:

Step 1. Describe what each row in the data set represents.

Knowing what you are working in a data set starts at the row level. I found ChatGPT was exceptional at identifying the meaning of the individual rows in my tests. For example:

Step 2. Change the data field labels to make them more human readable, use proper capitalization, expand out abbreviations, and remove non-alphabet and non-numeric characters.

Many data files arrive with column names written by DBAs that are hard to decipher. Take this collection of data fields:

  • FTResTuition

  • PTResTuition

  • FTNonResTuition

  • PTNonResTuition

If you are familiar with the data, these names may be obvious. Fortunately ChatGPT is able to turn those into:

  • Full-time Resident Tuition

  • Part-time Resident Tuition

  • Full-time Non-Resident Tuition

  • Part-time Non-Resident Tuition

Step 3. Group the data fields by topic or other logical grouping. For each data field, identify if it is a metric, dimension, boolean, or date.

Finding similar concepts is another Large Language Model strength. When you are dealing with data tables with dozens of columns, it can be helpful to understand how those data fields fit together. Equally impressive is the ability for ChatGPT to understand different data types.

Step 4. For each metric data field, show the highest and lowest value in parentheses. For each date field, show the earliest and latest date in parentheses. For each dimension, show the most frequently occurring value in parentheses

It can be really helpful to get a sense of your data by seeing the range of values and common values.

Step 5. Identify any data fields that have many null or empty values. Label these data fields as "null or empty”. Also, identify any data fields that have all the same value. Describe these data fields as "uninteresting"

Finally, data tables with lots of columns often have a lot of cruft — the blank or poorly populated fields that are better to push aside as you thinking about where you want to focus.

After defining all these steps, I played around with how I wanted it to render the results. I ultimately decided to consolidate steps 2 through 4, and suppress ChatGPT’s inclination to be verbose about the instructions. Here’s the final prompt that I landed on:

I want you to use the following Data Summarization process on a data set:

Step 1. Describe what each row in the data set represents

Step 2. Change the data field labels to make them more human readable, use proper capitalization, expand out abbreviations, and remove non-alphabet and non-numeric characters. Group the data fields by topic or other logical grouping. For each data field, identify if it is a metric, dimension, boolean, or date. For each metric data field, show the highest and lowest value in parentheses. For each date field, show the earliest and latest date in parentheses. For each dimension, show the most frequently occurring value in parentheses

Step 3. Identify any data fields that have many null or empty values. Label these data fields as "null or empty”. Also, identify any data fields that have all the same value. Describe these data fields as "uninteresting"

When you show the results, you can write the Step number but don’t need to include the step description. Are you ready for some data?

After pasting that full prompt into the chat window, I simply copy and pasted a chunk of data from Excel to get a result that looks like this:


Celebrating Women in Data Visualization & Storytelling

March is Women’s History Month, and we have been celebrating all month long on social media and as an organization! We wouldn’t be the same company and our industry wouldn’t be what it is without the amazing women in each! We wanted to take some time at the end of the month to celebrate female pioneers and influential women in data visualization and storytelling!

Florence Nightingale:

Florence Nightingale is considered to be one of the first pioneers of data visualization. While she’s best known for her advancements in nursing, she also is credited with being one of the most influential early figures to not just use data, but to show it in a way that could impact and move her readers - who were ordinary people and even Queen Victoria herself. Nightingale was known for her love of statistics. And during her time working in a military hospital, she helped to prove that hygiene and cleanliness of the hospitals were directly linked to soldier deaths in combat. She used her experience in nursing and love of statistics to take data and information that were collected and turn it into charts and graphs like the one below. However, because she was a woman in the 1800s, she isn’t adequately credited for her advances of data visualizations along with the “founding fathers" we are more familiar with.

Lea Pica:

Lea Pica is known worldwide as a data presentation guru, or as she describes herself, “Let me be your Slide Sherpa. Your Viz Vizier. Your guide on the exciting road to presentation enlightenment.” Pica used her experience in musical theatre to bring a “performance” aspect to her professional career. But try as she might, she realized that even all of the bells and whistles she thought would help her successfuly grab attention, were falling flat. She became a self-taught visualization expert and now, she’s among the ‘leading ladies’ of the data visualization and presentation world!

Amanda Cox:

Amanda Cox is an America journalist and data visualization that is well-known for her work as the data designer at the New York Times where she rose to serve as editor of The Upshot section. She worked as a graphics editor from 2005 through 2016 at the NYT. And her desk created the infamous election monitoring needle we see from the NYT every election cycle since 2016.

Cox is known as the “Michael Phelps of infographics,” a title we are quite fond of! In her opening statement of her keynote at the OpenVis Conference in 2013 she popularly said that ultimately design isn’t about typography or whitespace, but rather empathy - it’s about creating visualizations that readers can both understand and connect to emotionally. Since Cox's tenure, the Times has "led the field of innovative information graphics" and "raised the bar of journalistic interactive visualization."

She has also served as the judge for data visualization competitions, and several of her data visualizations were selected for The Best American Infographics 2014 and The Best American Infographics 2016. It’s easy to see why we would include her in this list of influential women who are cemented into the history of women in data visualization.

Emma Willard:

Emma Willard is probably best known for her visually-stunning maps, and being America’s first female map maker. Her Temple of Time visualization is one that she hand shaded and details the timeline of world history. She used a flow diagram to showcase the rise and fall of empires throughout history. Willard described her reasoning for this visualization in this way, “By putting the course of time into perspective, the disconnected parts of a vast subject are united into one, and comprehended at a glance;–the poetic idea of “the vista of departed years” is made an object of sight; and when the eye is the medium, the picture will, by frequent inspection, be formed within, and forever remain, wrought into the living texture of the mind.

Creating an Alternative Law School Rankings Report

The The New York Times recently published a story: “Defending Its Rankings, U.S. News Takes Aim at Top Law Schools” (paywalled) about how Law Schools are fed up with the US News & World Report rankings, and how the magazine is fighting back. I was particularly struck by this passage:

Ms. Gerken, the Yale Law School dean, and other participants suggested that the data gathered by the American Bar Association already provided good information for prospective applicants. The data provided on the bar association website, however, does not allow someone to easily compare one law school with another, and it lacks the emotional punch of number rankings like the one used by U.S. News.

Another sad case of good data stuck in bad formats like Excel downloads and antiquated interfaces. Fortunately, it is a problem that is very fixable with Juicebox.

We created an alternative Law School Comparison site using data from American Bar Association and AccessLex Institute. With this type of interactive report, we think about a few key things:

  • How do we provide interactivity so the user can make the results most relevant to their needs?

  • How do we give users a workflow through the data to support their exploration?

  • How can we guide and narrate this journey with good descriptions, labels, and visual indicators?

When Law Schools and American Bar Association are ready to break free of the tyranny of US News & World Report (but still recognize that data transparency is important for decision-making) they know who to call. Check it out 👇

Story Endings Are Hard

“Endings are hard” is the subject of a recent episode of Malcolm Gladwell’s podcast Revisionist History. He shares a live stage with comedian Mike Birbiglia, an extremely accomplished storyteller in his own right.

Together they bemoan the inadequacy of many story endings. Gladwell compares how we evaluate people and how we evaluate stories. Unlike our snap judgements about people,

…our evaluation of stories is the opposite. It's back loaded. What happens in the last five minutes colors every conclusion we drew in the first two hours. I will guarantee you that every screenwriter and author and podcaster frets endlessly about how their stories begin, rewrites the beginning a million times, but aren't nearly as fastidious about the ending, which is nuts.

As he is known to do, Gladwell arrives at a succinct and unifying theory:

The difference between a story and an anecdote is a story is a narrative that betrays the listener's expectations. There must be an active betrayal for the story to work.

When we teach about data storytelling (check out our free lessons), we focus on using the powerful techniques of narrative to reach our audience and change minds. We want to connect by touching on ingrained concepts like setting up the conflict, connecting ideas with a logical flow, establishing characters, and using specificity.

This discussion of endings provides another guideline to consider with your data story. By the end of your story, are you subverting expectations?

This concept connects to the recent dialogue about “what is an insight?” One suggestion is that insights need to break through an existing understanding or assumption. That is, they need to “betray expectations.”

In contrast, an anecdote merely reinforces what we already believe. Anecdote-style data communication has a place, especially if you are trying to educate people in your organization. You don’t always need to be exploding their minds with a new insight — sometimes you just want your audience to take actions that are consistent with something that is known.

At one point in the podcast, Gladwell provides an example that helps solidify his distinction:

An anecdote is a narrative that conforms with your expectations. For example,

So the craziest thing happened to me last night. I found a hundred dollars bill on the street. That is not a story, that is an anecdote. The first sentence craziest thing happened is the equal of the second sentence, a hundred dollars bill on the street.

A story is a narrative that betrays the audience's expectations.

The craziest thing happened to me last night. I found a hundred dollars bill on the street. I gave it to a, tried to give it to a homeless man and he said, “I don't want your effing money.”

Tech Layoffs, Visualized

The last few months have been difficult for technology workers. It seems like every week, we hear about a blue-chip tech company laying off thousands of employees. Crunchbase has been tracking US-based technology layoffs here. But an ever-growing table like the one below doesn’t exactly tell the story or reveal trends.

Crunchbase data on Tech Layoffs, 2022/2023

There’s obviously a lot of value hidden in this data, so we pointed Juicebox at it to discover (and share) some of those hidden insights. The interactive report we built is embedded below, but here are some things we captured during our exploration:


Here’s the embedded report so you can explore the data for yourself. Start scrolling: