Outliers get a bad rap in data.
Most of the time they’re the first thing you want to get rid of. Or they’re relegated to the outermost regions of box-and-whisker plots, beyond the whiskers. When you see them on your scatter plots, it just looks like you dripped a spot of coffee onto your printout. “What, that? It’s nothing--it’s just an outlier. Come on, let’s focus on the average instead.”
But to paraphrase the economist Robert Shiller, “the outliers matter the most.” Let’s talk about the instances where the outliers are the stars of the show in your financial data. When they’re the reason you bought a ticket, dragged yourself out of bed, and showed up.
When that’s the case, you want to find that outlier and put it on a stage. Here are two ways to track them down and then show them off.
Highlight them in Excel
One of the easiest ways to find statistical outliers is to calculate what counts as an outlier in Excel and then highlight them in your data with conditional formatting.
There are a few different definitions of what a statistical outlier is. Here are a few options:
- The interquartile range (IQR) definition: Anything outside of an upper and lower bound based on the quartiles in your data is an outlier. You can learn the nitty-gritty of how to calculate the bounds from this Dummies article or this How-to Geek article.
- The Z-score definition: Anything that’s three standard deviations away from the mean of your data is considered an outlier. More on the Z-score calculation from this How-to Geek article.
Let’s dive into an example of both using this audit fee data from idaciti.
After downloading the data, you’ll see that we have revenue and audit fee figures for each company. I’ll apply the IQR method to revenue and the Z-score to audit fees so we can see both in action. (If you’re not sure how to, check out the links above.)
After applying the formulas and using conditional formatting to highlight what’s outside of the bounds, this is what we get:
With the cells highlighted, it’s easy to tell that Apple, Exxon Mobile, UnitedHealth, and WalMart are revenue outliers, whereas GE is the only outlier in terms of audit fees.
Visualize bivariate data with a scatter plot
But let’s say you don’t just want to look at numbers--you want to see a visual of how much of an outlier something is.
And you don’t want to look at just revenue or audit fees alone--you want to see how they work together (which, if you care, the term for is “bivariate” because there are two variables).
In that case, there’s nothing better than your basic scatter plot.
When it comes to outliers in financial data, you always want to be aware of them. Whether they’re positive or negative, an outlier for something like revenue or fees usually means one of two things: either the data’s wrong, or there’s something to be learned. And now, with scatter plots, Z-scores, and IQR, it’ll be easier than ever to identify them.
Data in this post is found in this idaciti card. Download the data by “flipping” the card by clicking on the 🔁 icon in the top right-hand corner. In the graph view, you can see data for other top 20 companies by clicking on the filter. In the data download view, you can also see more details by clicking on the data points.