By: Sathish Prasad, Roshni Balasubramanian, Jupiter Seong
Introduction
In the current world of data and with the emergence of many new methods and techniques, we can forget or not pay much attention to the foundations. One such basic but very important concept which holds so much business value is Causal Inference.
People often tend to confuse causality and correlation, and this 2 part article by Roshni, Jupiter and Sathish will dive deep into causation. Part 1 of this series will cover the difference between causal and correlation. We will also be looking at some concepts and techniques of Causal. Part 2 will be the practical implementation of these concepts.
This article targets an audience with little or no statistical knowledge, so many explanations are simplified. The authors tried to avoid jargon such as regression, endogeneity, and coefficients as much as possible.
What’s the Difference between Causation and Correlation?
Before we dive deep into Causation, let’s first understand the difference between causality and correlation. Correlation, in simple terms, statistical association between two variables, indicating that changes in one variable see changes in another. It does not imply causation.
Causality, on the other hand, denotes a cause-and-effect relationship between two variables. If changes in one variable reliably result in changes in another, a causal relationship may exist. Establishing causation requires more rigorous evidence.
Let’s consider an example to understand why correlation does not mean causation. Let’s take Umbrella Sales and Car Accidents as two example variables during a rainy month.
Suppose there is a positive correlation between increased umbrella sales and the number of car accidents in a city. The data shows that as umbrella sales rise, so does the number of car accidents.
However, concluding that buying more umbrellas causes an increase in car accidents would be a misinterpretation. In reality, both events are influenced by a common factor—rainfall. When it rains, people are more likely to buy umbrellas, and the wet conditions can contribute to an increase in car accidents.
The correlation between umbrella sales and car accidents is coincidental, as there is no direct causal link between purchasing umbrellas and a rise in accidents. Rainfall is the underlying factor influencing both variables. This example highlights the importance of identifying the true causes behind correlations and avoiding hasty conclusions.
Potential Outcomes Framework
Now, let’s delve into the key concepts of causality, often referred to as the Potential Outcome Framework. The primary goal is to comprehend the treatment effect on a specific group of individuals. To illustrate these concepts, we’ll consistently use the example of examining how an increase in study load affects anxiety levels in students.
Before we dig further, here’s some explanation on jargon.
Outcome Object of interest
- Student anxiety levels
Treatment Our (assumed) cause
- Increment in study load
Treatment Effect The impact of the treatment on the outcome of interest
- The effect of an increased study load on anxiety levels among students
Treated Group Group who received/experienced treatment
- Students who experience an increased study load
Control Group Group who did not receive/experience treatment
- Students who do not face an increased study load
Confounders Factors that affect both treatment assignment and the outcome
- sleep patterns, personal stressors, or prior academic performance
Maybe a student’s prior academic performance was bad causing increased study load and increased anxiety level simultaneously - Because confounders change the treatment and outcome at the same time, it is easy to mistake correlation relationships with causation in this setting.
One might think increased study load is causing the increased anxiety level whereas prior academic performance is causing it.
Let’s stop for a minute and get back to the Treatment Effect. When we defined the term, we didn’t discuss in detail who is experiencing the impact. Treated Group? Control Group? Or the whole population? In fact, there are three different types of Treatment Effect depending on which group is of the focus.
Average Treatment Effect of the Treated Group (ATT)
This represents the average impact of the treatment on those who actually receive it. In our example, it would be the average change in anxiety levels among students who experience an increased study load.
Average Treatment Effect of the Control Group (ATC)
This denotes the average effect of the treatment on those who do not receive it. In our case, it would be the average change in anxiety levels among students who do not face an increased study load.
Average Treatment Effect on Everyone (ATE)
This provides an overall average treatment effect on the entire population, encompassing both the treated and control groups. In our example, it would be the average change in anxiety levels for all students, regardless of whether they experienced an increased study load or not.
By applying the Potential Outcome Framework to the example of studying the impact of increased study load on anxiety levels, we can systematically analyze and interpret the causal relationships involved in such scenarios. This is the first step and approach to any causal problem and drawing Directed Acyclic Graphs will help one understand the problem better.
Introducing the DAG
Directed Acyclic Graphs, or DAGs, serve as visual representations of causal relationships, providing a clear and intuitive way to understand connections among variables. Think of it as a blueprint of a causal model.
In our example, examining how increased study load might impact anxiety levels, we can illustrate how these relationships would behave. We anticipate that study load directly affects anxiety levels. Additionally, we acknowledge the influence of other variables, such as academic performance, on anxiety levels. For instance, academic performance might influence the amount of study load a student takes on—considering scenarios like taking more AP courses than an average student.
DAGs are constructed based on subject-matter knowledge and serve as a supplemental tool to visualize our hypothesis.
Techniques via Examples
Causal techniques share the same goal: They want to estimate the causal influence of the treatment to the outcome excluding the influence of other confounders on the outcome. Techniques achieve this by ensuring the control group and the treated group look alike except for the treatment assignment.
Coming back to the DAG above, we are interested in the arrow from the study load (treatment) to the anxiety level (outcome). However, there’s another arrow coming from academic performance (confounder) to anxiety level (outcome). It is hard to accurately estimate the treatment effect (change of anxiety level caused by study load) because there are two factors influencing the anxiety level. We can only estimate the contaminated treatment effect which is the change of anxiety level influenced by the study load AND academic performance.
Think of it this way. You are on an innovative pizza diet (treatment), and you want to know how much the weight(outcome) changed because of the diet (treatment effect). However, for some reason, you have to weigh yourself when holding your dog (confounder). In this case, it is hard for you to know how much you gained or lost weight because the measurement is not only based on your weight but also on the dog’s weight. In the study of the casualty, there are various ways to estimate the change of the weight even in this setting. Let’s start with RCT.
RCT
Randomized Control Trials, RCT leans to randomization to adjust the weight from the dog. Imagine, a scientist selects a hundred dog owners and randomly assigns fifty dog owners to start the pizza diet (treatment group) and the other fifty owners to eat as before (control group). Then, she subtracts the average weight of the control group holding their dogs (AVGcontrol) from the average weight of the treated group holding their dogs (AVGtreatment) in February. In this case, this simple difference in means is the accurate estimate of the average treatment effect (the change in weight caused by the pizza diet).
There is a key idea as to why this is the case. We assume that AVGcontrol is AVGtreatment’s counterfactual, a fancy word for the ‘would-have-been’ average of the treated group if they were not treated. Meaning, that we think AVGcontrol is a proxy for the average weight that would’ve been the average weight of the treated group if they had not done the pizza diet. This is sound logic because of the randomization.
Randomization ensures the treatment and the control look alike except for the treatment. When you flip a coin to determine who gets to try the pizza diet, people who are assigned to the treatment and who are not are essentially people with the same characteristics (confounders). Because of this similarity, we can use the average of the control group as the counterfactual of the treated group. Additional dogs’ weight is not a problem anymore because the average weight of the dogs will be the same for the treated and control groups. By subtracting two averages of the groups, we can eliminate the dogs’ weights.
Mathematical Illustration
(Avg weight of the treated group + Avg weight of the TG’s dogs) – (Avg weight of the control group + Avg weight of the CG’s dogs)
= (Avg weight of the treated group) + (Avg weight of the TG’s dogs – Avg weight of the CG’s dogs) – (Avg weight of the control group)
* Avg weight of the TG’s dogs = Avg weight of the CG’s dogs
= (Avg weight of the treated group) – (Avg weight of the control group)
= Average Treatment Effect
But what if the scientist assigns the treatment based on the dog’s age? Then the treated and the controls do not look alike anymore. Maybe young dogs tend to work out more, which leads to less weight for the dogs. Then, the average weight of the TG’s dogs and the average weight of the CG’s dogs are not the same anymore, therefore we cannot eliminate the dogs’ weights by simply subtracting two averages.
Matching
RCT is a simple and powerful way to extract the treatment effect but it takes a lot of money and time. If we have data for the dog (confounder), we can generate a similar effect of randomization.
Let’s say we have weight data of the owners while holding the dog, who is on the pizza diet, and the age of the dog. Then, for a treated owner we could match a control owner who has a dog of the same age, which will serve as a counterfactual of the treated owner. By matching, we once again recover the likeness of the dogs’ weights (confounder) as we did for RCT, allowing us to estimate the treatment effect by subtracting the two weights of the owners holding their dogs. We repeat the process for all the treated owners and average the difference to get the average treatment effect.
But what if there are multiple control owners with dogs of the same age for a treated owner? In a real setting, we would have more detailed data such as the dog’s breed, age, gender, etc so the likelihood of multiple matched controls is significantly less. However, if such a situation happens we can use the average of the multiple matched controls as the counterfactual.
IV
Let us get back to the students’ anxiety level. We want to estimate how much the anxiety level of students (outcome) increases to their workload (treatment). As we did in matching, we can try to make the control group and treated group look alike except for the treatment with an available dataset that provides confounders. But how do we know if we included all possible confounders, and what if a significant confounder is something that cannot be captured in a dataset, for example, students’ personality or natural abilities?
In this setting, problems arise because our TG and CG do not look alike anymore since ‘something’ is influencing the assignment of the treatment but we cannot analyze this ‘something’ since we don’t have it in our dataset. This is called the Omitted Variable Bias (OVB) problem and could be solved through Instrument Variable (IV).
It is not hard to imagine a student’s personality affects both her workload and anxiety level. Perhaps, students with perfectionist personality traits will tend to work more and therefore be more anxious. Also, they will be just naturally more anxious because of their personality regardless of the workload. Our treated group will contain more perfectionists than the control group, so if we simply compare two average anxiety levels we get an inaccurate treatment effect.
Mathematical Illustration
Avg Anxiety Level of TG – Avg Anxiety Level of CG
= (Anxiety from higher workload + Anxiety from perfectionist traits) – (Anxiety from lower workload)
= (Anxiety from difference in workload) + (Anxiety from perfectionist traits)
▲ Above our desired treatment effect (anxiety from difference in workload) is contaminated with anxiety from perfectionist traits (OVB)
Consider extracurricular activities from the DAG. It is related to student workload (students who participate in extracurricular activities work more), is not related to the outcome directly (Unlike perfectionist traits, students do not get more anxious because of the extracurricular activity itself), and finally is not related to the omitted variable (extracurricular activity participation is not relevant to perfectionist traits).
Because extracurricular activity (IV) is not related to the problematic personal traits (Omitted variable), from the viewpoint of the omitted variable whether or not a student is participating in extracurricular activities is somewhat random. Furthermore, treatment assignment is influenced by the IV.
Therefore by considering IV we are introducing randomness to the treatment assignment in terms of omitted variables, recovering similarity between the control group and the treated group.
Choosing Between Causation and Correlation
Causation, as discussed earlier, relies on cause-and-effect relationships. For instance, if we attribute an increase in sales (B) to a specific factor (A), a causal relationship is unveiled—B occurred only because of A. In contrast, correlation describes the statistical association between variables without implying a direct cause-and-effect relationship. A simple increase in A coinciding with an increase in B doesn’t necessarily imply causation.
Understanding when to choose between causation and correlation is crucial for strategic decisions. In the economic concept of products being substitutes or complements, the relationship is inherently correlated rather than causal. For example, if the price of printers decreases, it correlates with an increase in the demand for printer ink, exemplifying a typical relationship between complementary products. This correlation provides valuable insights for businesses in shaping their pricing strategies. By recognizing the degree of correlation between two products, companies can strategically adjust their pricing to capitalise on this relationship. For instance, they may implement bundled pricing or promotional offers to encourage customers to purchase both items, ultimately optimizing revenue and enhancing overall market competitiveness.
Examining a potential causal relationship requires a comprehensive analysis of several factors. For instance, if the price of tea increases, influencing the demand for coffee due to affordability considerations, we can employ tools such as regression analysis. Regression helps quantify the extent of this relationship, providing valuable insights into how much the demand for coffee may increase in response to a rise in the price of tea. Businesses can use this causal relationship to their advantage by planning inventory and production. This capability also allows for creating contingency plans to adapt to changes in the industry and address them quickly.
Conclusion
In conclusion, armed with two powerful analytical tools, we as analysts need to exercise caution in selecting the appropriate methodology based on the specific circumstances. It is imperative not to succumb to the common misconception that correlation implies causation! Making informed choices about when and how to employ these tools is essential for accurate and meaningful insights, safeguarding against the potential pitfalls of misinterpretation.