Four data visualization mistakes you’re probably making and how to fix them
How to avoid misleading your readers with your charts
by Yaning Wu and Mafe Callejón
As data visualization practitioners, we use data to communicate, educate and inspire. Sometimes, though, our charts can display information in ways that are confusing or simply inaccurate.
It goes without saying that misleading graphs can easily spread misinformation or cause distrust in your viewers. To help you avoid such outcomes, today we focus on four common data visualization mistakes and how you can fix them.
Keep in mind that function should always go over form. Your first goal should be to visualize the data in the correct format, not the most flashy one. If you’re torn between different chart types, ask yourself: “What am I trying to show?” That should help you choose the right graph.
To illustrate this, here’s an example of a dataset about the population of the quokka, an elusive Australian mammal, in Australia’s Mornington district between the years of 1992 and 1996. This is time series data, or data that shows variation across time, so visualizing change is our most important task here.
The following scrolly shows the process of choosing the right chart for our data.
A pie chart, which shows part of a whole, might sound like a good idea at first. However, this doesn't allow the reader to get a clear idea of the differences between the years and doesn't convey the fact that this data changes over time.
After all, the quokkas counted over these four years may have been the same individuals, so adding up their total population might not be accurate.
The line chart is the go-to chart type to show time series. However, they work best with evenly-spaced time intervals. That means: one tick is a time unit (a month, a day, a year) that adds up in the axis.
Our sample data deals with uneven intervals, meaning that joining dots with a continuous line might actually be a bit misleading.
In this case, the best way to visualize our data is with a column chart. Here the bars' heights are scaled to show the total quokka population per year, which is more effective than the pie chart.
Compared to the line chart, the bars are not connected to each other, so the idea that the time blocks are separate is slightly reinforced. Meanwhile, this graph still allows for easy comparisons over time.
Mistake #2: Over-exaggerating your data with your axes
Axes allow us to show the scale used to position chart elements. However, in some cases axes can mislead your viewers.
Take a look at the following dataset of race times for the women’s 100-meter breaststroke SB5 final in the Rio Paralympics:
Now, let’s learn how axis manipulation can cause more harm than good and how to fix it.
To highlight the margin of victory of Yelyzaveta Mereshko over her peers, we might build a bar chart with a non-zero baseline, meaning that the axis starting value is not zero. Remember that the shorter the bar, the faster the swimmer.
This looks reasonable, right? The winner has the shortest bar as she swam the fastest.
However, this chart gives the impression that Mereshko was more than six times faster than the last finisher in the race, Italian swimmer Emanuela Romano. This is because setting a non-zero axis inflates the differences betwen the other numbers.
By ensuring the axis includes 0,you ensure the bar lengths are proportional to their value. Now, it's clear that Mereshko was over ten seconds faster than her Italian competitor (still a considerable advantage).
If you want to emphasize the time difference between competitors, then turning the bar chart into a stacked bar might be a good alternative to cutting off the axis.
You could also follow a different approach and switch the bar chart for a dot plot using our “Scatter” template.
The dot plot removes the length of the bars as a variable, which helps the user focus on the value that's determined by the position of the dots on the grid.
In this version, you can still show the chart with the truncated axis, but we still encourage users to use zero-based axes whenever possible, at least as the first view of the chart.
To sum up:
✅ Almost always set your axes to start at 0.
❌ Don’t force conclusions onto your readers. Let your data speak for itself.
Data visualization pitfalls aren’t exclusive to chart elements. Sometimes, the mistake might be in how you are framing the data you’re showing. Here are two mistakes that visualization practitioners can make by not taking enough care when examining their data.
Mistake #3: Not giving enough context to your readers
In terms of context, more is more. The more information you give to your reader, the better equipped they will be to understand your charts and message. For example, here’s some data of global temperature anomalies over time:
Now let’s see that in a chart:
Climate change and temperature anomalies are phenomena that have been evolving through time, so it's important that we give the reader enough information to correctly interpret the data.
Here, if we only plotted the temperature anomaly data for the last decade, the reader might get the impression that temperatures aren't really rising that much. Look! There even was a drop in 2018.
However, when expanding the chart to include the full available dataset from the mid-19th century, it's clear that anomalies have drastically increased since this data started being recorded.
You can also improve your charts and add context through annotations, headings and footnotes.
To sum up:
✅ Use headers, annotations, highlights and footnotes to add information that helps your reader.
❌ Show, don’t tell: don’t clutter your visualization with excessive amounts of text.
Mistake #4: Confusing correlation with causation
Our last common mistake is to confusecorrelation (the extent to which two variables change together) withcausation (the cause-and-effect relationship between two variables). This happens a lot, misleading readers into thinking that two variables are related when, in fact, they are not.
This website has a good collection of examples of coincidental or spurious correlations. Although these examples might seem over the top (would anyone really think that the number of people who died by getting entangled in their bedsheets is related to per capita cheese consumption?), they illustrate the point: false relations lead to inaccurate conclusions.
To illustrate this, here’s a snippet of a dataset containing countries’ cumulative COVID-19 cases until May 3rd, 2022 and their 2017 GDP per capita:
The best way to display the relationship between these two variables would be a scatter plot, showing GDP and COVID cases on the X and Y axis respectively.
At first glance, you might jump to the conclusion that lower GDP results in countries having less COVID-19 cases. Makes sense, right? The chart clearly shows that poorer nations have been less affected.
Wrong! This is a classic case of mixing up correlation and causation. There is a clear relationship between the variables (correlation) but this doesn't mean that one factor drives the other (causation).
There may be other factors intervening in this relation that mean that there isn't a direct relationship between GDP and coronavirus severity. For example, countries with lower GDPs could have younger populations in general.
So, to avoid misleading readers about a correlation that may be disguising other relationships, we can simply change the title to reflect these nuances.
To sum up:
✅ Look at your data carefully to understand all the driving factors between variables.
❌ Don’t jump to rushed conclusions and don’t let your personal biases affect your analysis.
Why it matters
Data visualization tools have enabled more people to build and showcase charts in all media, but they’ve also made it easier for data to be used in careless or dishonest ways. At Flourish, we aim to educate each of our users on the basics of data literacy so they can responsibly make the most out of our tool. Knowing the do’s and don’ts of data visualization makes you a good data viz practitioner and it also helps you spot biased charts in the wild.