When You Should (and Shouldn’t) Rely on Correlation

The march to data-driven marketing in recent years has been as relentless as the flow of lava down the sides of an erupting volcano.

The use of data in marketing is by no means new, but marketers now have access to a vast amount of data regarding customers and potential buyers. Equally important, they also have access to powerful and affordable analytics technologies.

Today, it’s nearly impossible to find a marketer who doesn’t think using the right data in the right ways can improve marketing performance.

Much of the heavy lifting in marketing data analysis involves correlation. In simple terms, correlation is a relationship between phenomena or things – “variables” in the lingo of math and statistics – that tend to vary or occur together in ways that aren’t due to chance alone.

It’s not surprising that correlation plays such a central role in marketing analytics. A single data point can provide useful information, but the real power of analytics is its ability to identify and quantify relationships between two or more “variables” in your marketing data. Understanding these relationships can enable marketers to make decisions that improve marketing performance.

Correlation ≠ Causation

One of the fundamental principles of data analysis is that correlation does not establish causation. In other words, data analysis may show that two events or conditions are strongly correlated statistically, but this alone doesn’t prove that one of the events or conditions caused the other.

The following chart provides an illustrative example of why marketers must never forget the distinction between correlation and causation. It shows that from 1999 through 2009 there was a strong correlation ( r = 0.99789126 for you data geeks) between US spending on science, space, and technology, and the number of suicides by hanging, strangulation, and suffocation. (Note: To see this and other nonsensical correlations take a look at Spurious Correlations.)

Source: Tyler Vigen, Spurious Correlations

I doubt any of us would argue that there’s a causal relationship between these two variables (despite the strong correlation) because they just don’t have a plausible relationship. In marketing, however, it’s easy to encounter events that are strongly correlated and have a plausible cause-and-effect relationship. The problem is, the causal relationship, while plausible, can be weak or nonexistent.

When To Rely On Correlation

It’s preferable, of course, to base marketing decisions and actions on proven cause-and-effect relationships, but this may not always be realistic or even possible. Proving the existence of a causal relationship typically requires the use of a well-designed and tightly controlled experiment. In marketing, such experiments can be easy to conduct in some situations, but difficult, if not impossible, to run in others.

Under these circumstances, the real question is: When should marketers act based on a correlation?

David Ritter with the Boston Consulting Group described a process for answering this question in an article published on the Harvard Business Review website a few years ago. I’ve used Ritter’s process – with a couple of minor modifications – numerous times in my work with clients, and I’ve found it to be effective at focusing the attention of decision-makers on the right issues.

The diagram below is my adaptation of Ritter’s framework.

Whether you should rely on a correlation depends primarily on two factors – your confidence in the correlation as an indicator of cause and effect, and the balance of risks and rewards.

Confidence in the correlation – The first factor is your level of confidence that the correlation points to a real cause-and-effect relationship. This factor is in turn a function of two things:

How often the correlation has occurred in the past. The more frequently events have occurred together, the more likely it is they are causally related.

The number of possible explanations for the effect under consideration. For example, your data may show a strong correlation between the number of marketing emails sent and revenue growth during a given period. But, if there are several plausible explanations for the increased revenue, you have less reason to think there’s a causal connection between the number of emails sent and revenue growth.

The balance of risks and rewards – The second factor involved in determining whether you should rely on a correlation is an evaluation of risks and rewards. Any decision based on a correlation should include an assessment of the potential risks and benefits associated with the action.

The above diagram illustrates how these two factors are used together to help you decide whether you should act based on a correlation.

I need to make two points about using this framework. First, it’s important to go through this analysis for each action you’re considering. When you identify a correlation, there will probably be several ways you could act on that correlation. Each option should be evaluated separately because they will probably have different risk-reward profiles.

It’s also important to consider the size of the “gap” between the potential risks and rewards. For example, if a potential action has huge potential benefits and very low risks, you may want to act even if your confidence that the correlation indicates a cause-and-effect relationship isn’t very high.

Top image courtesy of Global Panorama via Flickr (CC).