An outlier is a value that is significantly higher or lower than most of the values in your data. When using Excel to analyze data, outliers can skew the results. For example, the mean average of a data set might truly reflect your values. Excel provides a few useful functions to help manage your outliers, so let’s take a look.
A Quick Example
In the image below, the outliers are reasonably easy to spot—the value of two assigned to Eric and the value of 173 assigned to Ryan. In a data set like this, it’s easy enough to spot and deal with those outliers manually.
In a larger set of data, that will not be the case. Being able to identify the outliers and remove them from statistical calculations is important—and that’s what we’ll be looking at how to do in this article.
How to Find Outliers in your Data
To find the outliers in a data set, we use the following steps:
- Calculate the 1st and 3rd quartiles (we’ll be talking about what those are in just a bit).
- Evaluate the interquartile range (we’ll also be explaining these a bit further down).
- Return the upper and lower bounds of our data range.
- Use these bounds to identify the outlying data points.
The cell range on the right of the data set seen in the image below will be used to store these values.
Let’s get started.
Step One: Calculate the Quartiles
If you divide your data into quarters, each of those sets is called a quartile. The lowest 25% of numbers in the range make up the 1st quartile, the next 25% the 2nd quartile, and so on. We take this step first because the most widely-used definition of an outlier is a data point that is more than 1.5 interquartile ranges (IQRs) below the 1st quartile, and 1.5 interquartile ranges above the 3rd quartile. To determine those values, we first have to figure out what the quartiles are.
No comments:
Post a Comment