How to Find Outliers in Your Research Data

February 6, 2021

How to Find Outliers in Your Research Data

Having an outlier can mean different things depending on the context, but with data, it almost always means the same thing. It’s the data point or points that don’t match up with the rest. When you’re collecting data for a research project, you’ll notice that your data will start to form an average. If you were to put your research data on a point chart, you could draw a line between the points to find the direction it’s going.

But sometimes, one of the measurements you tracked during your experiment or study was really different from the others. That’s the outlier, and if you include that in your data, it’ll shift the data’s average by a lot, far from what your graph or chart is showing.

That’s the danger of an outlier. It can twist data measurements so that they’re less accurate than what you should be expecting. If your research data is for a product, service, or report, you don’t want to ignore outliers. They will almost always appear, but you want to separate them from the pack. You want to know the regular average and how many outliers you should expect to disagree with that average.

Consequences of Not Catching Research Outliers in Your Data

We stated that when you don’t catch and take note of your research outliers, they skew your data. This is a serious problem when you’re using the data to make a decision about a product, service, or scientific development.

For example, an outlier can change your expectations for a product. When you’re researching how a target audience will react to your new product, an outlier or outliers will skew it. You may think your data supports more or fewer products going to a certain audience than it should. That’s revenue lost by overstocking or not meeting demand. The same can be said for services.

For scientific research, not catching an outlier can make your thesis appear false or true when it’s the opposite. Whether for a grade, a future product, or a paper you want to publish, if you present this information and data with the outliers and the skew, someone else will eventually discover it and tear apart your hard work.

How to Find Outliers in Your Marketing Data

To avoid the consequences of research outliers, we have to know how to find them. To find them, we have to know the types and what they look like. There are three types of outliers that you can find in research data, some only being common in certain types of data.

Global Outliers

This type of outlier is one of the easiest to find in your research data. They are points where the data measured doesn’t match anything else in the study or experiment. In fact, these points are so far outside the area of distribution that if your margin of error were doubled, they likely still wouldn’t be close to falling into it.

To find them, decide what your margin of error is. A margin of error is the amount that you allow for miscalculations and changes of circumstance. If you’re recording numerical measurements, such as the height of people, you have a margin of error where you allow a person to be off by 1 or 2 inches when reporting their height.

If your data is numerical, you want a margin of error to account for miscalculations in sales numbers, consumer interest, etc. If a data point far exceeds your margin of error, it’s likely a global outlier.

Contextual Outliers

This type of outlier is where there is a clear reason for the data to change. For example, if you’re following sales for a product, and the sales skyrocket in December and then dip in January, you know that it’s because of holiday sales. There’s context as to why the trends changed as they did. At this point, this contextual outlier can still remain in your research data as long as you explain this change and account for it.

Contextual outliers can be less dangerous to your results than others because they provide reasoning for their existence. If you know why they’re happening other than random chance, you can incorporate it into your results.

There are some contextual outliers that are also global outliers technically, but once they have context, it is better to refer to them as contextual outliers.

Collective Outliers

This type of outlier can be the hardest to identify. This is where a number of data points deter from the pattern, but mostly all in the same way. If you have a sample size of 100 data points, where 60 to 70 follow a pattern, and 15-25 follow their own pattern, with a few global outliers, you have a big problem. You can no longer accurately discern the data’s pattern.

You need to figure out what caused the two different paths and which is the real pattern. In these cases, there is usually a catalyst from the study, survey, or experiment that caused certain options to happen. You have to dive deep into how you collected your data and look for the pattern to discern which data path is actually reflecting what your data was originally for.

Do not assume that the pattern with the most data points is the correct one. In many cases, if you have collective outliers, there was a problem with how you collected your data. When that is the case, it’s possible that you don’t have any useful data at all or that you skewed the majority of what you recorded. It’s not uncommon to have to recollect your data to fix this problem.

Work with Focus Forward for Data Research

Focus Forward offers an array of services that can assist you in your data collection, or provide you with services that help with your collection needs. These services include recruitment for virtual and in-person studies, best-in-class transcripts for your data records, and engaged project management to keep data collection on track.
For more information and assistance with your project, contact us at Focus Forward now.