Omitted variable bias occurs when your linear regression model is not correctly specified. This may be because you don’t know the confounding variables.
Confounding variables influences the cause and effect that the researchers are trying to assess in a study. So, if the researcher cannot include these confounding variables in the statistical model, it can go overboard or hide the real association that exists between two other variables.
When a researcher omits confounding variables, the statistical procedure will then be forced to correlate their effects to the variables in the model that caused bias to the estimated effects and confounded the proper relationship. This altercation is referred to as an omitted variable bias by the statisticians.
Regardless, it is a serious condition that can invalidate your research findings.
In this post, you’ll learn about omitted variable bias, how it occurs in research, how you can detect it, and how to avoid it.
When a researcher cannot include the right control measures in a regression analysis, there will be selection bias. This bias is known as omitted variables.
To further understand this, when the confounding variables in a study are unknown or perhaps the data to identify them do not exist, then they have omitted variables.
It is one of the most significant problems to occur in regression analysis.
Let’s consider this example.
It is assumed that the more knowledge you gain, the more your earning power. Recent studies show that children that have access to many books develop and perform better academically.
Can we then say that if parents stock their shelves with books, their children will be employed in high-paying jobs when they grow up? Or has the research omitted a significant variable?
Let us analyze the traits present in the parents.
Is there any possibility that the parents with higher IQs would have more books on their shelves, leading to the higher academic performance of their children?
So can we say that the higher academic performance and higher earning power of the children are attributed to the books on the shelves? Or the IQs of the parents or both?
From the example cited above, the omitted variable would be the parent’s IQs. This is because having books on the shelves cannot be the only contributing factor to the higher education performance and higher learning power of the children. An important factor must have been ignored in the data, which is the omitted variable bias.
Omitted variable bias refers to a bias that occurs in a study that results in the omission of important variables that are significant to the results of the study.
When there is an omitted variable in research it can lead to an incorrect conclusion about the influence of diverse variables on a particular result.
Let’s consider an instance where a researcher tries to understand what influences unemployment. If the researcher were to ignore the effect of minimum wage on the rate of unemployment, then there will be an omitted bias. This is common in the regression analysis field.
So when the independent variable shows an effect caused by other variables that have been ignored in the research, that is an omitted variable bias.
Note that committed variables occur mostly in observational studies.
Let us look at this example to better understand the concept of omitted variable bias.
A patient got an X-ray done on both legs. While researchers in a biochemical laboratory assess the results of the legs’ X-ray, a study shows the effect that occurred on the bone density from physical activities.
The researchers then proceed to measure unique traits. Some of which include the patient’s level of activities, weight of the patient, and bone density.
A study suggests that there may be a relationship between human bone density and their level of activity. This means that the higher the level of activities, the greater the bone density.
To understand the leg X-ray, the researchers test whether there is a relationship between the level of activity and the bone density. The result shows that there is no supporting evidence that shows a relationship.
On a second test, they found a confounding variable in the model.
The researchers then conduct a test using the activity level as one of the independent variables. However, the result shows that the bone density and the activity level matches with another variable, which is the patient’s weight.
To avoid the omitted variable bias, the weight of the patient was included in the regression analysis model with the activity level. The result showed the weight and the activity level are significant statistically and match with the bone density.
This example shows that, with the correct coefficient estimate, omitted variable bias can be avoided in a study once detected.
Now we are going to consider the causes of omitted variable bias in research. Why do they appear in research?
For omitted variable bias to be present in research, these two conditions must be satisfied:
So when the researcher’s assumed specification is incorrect because it omits an explanatory variable or independent variable that determines the result of a dependent variable, an omitted variable bias exists in your study.
This bias results in the statistical model relating the effects of the neglected variables to the variables that are included.
This means that one assumption made by the researcher has been violated by the residuals.
For instance, if you have two important independent variables in your regression model, let us represent these two variables as X1 and X2. Note that the two independent variables match with each other and also with the dependent variable and this causes omitted variable bias.
Let us now assume that the second independent variable is taken out of the model, as it is the confounding variable. Now, this is what to expect:
The data will not fit well into the model because one significant independent variable has been removed from it. This will cause an increase in the gap that exists between the fitted values and the observed values. This is because the gaps between the fitted and the observed values are the residuals.
Also, the relationship between the dependent variable and the second variable that was taken out (X2) is what each residual depends on increasing. The result is that X2 will match with the residual.
Therefore, X1 will match with X2 while X2 will match with the residuals. Since X1 and X2 are both independent variables and X1 matches with X2, that means that X1 will also match with the residuals.
Hence, the assumption that independent variables and the residuals do not match in the model is violated. A biased estimate will be produced from this model because the assumption made has been violated.
The point to take out of the above scenario is that when you omit a confounding variable, you reduce the larger residuals. You also leave the coefficient estimates biased.
There are no known statistical tests that can detect omitted variable biases in research. However, you can include possible omitted variables in your study if one or more instrumental variables are not present.
So if you don’t have the measurement for possible omitted variables, you would have to assume that you can omit one or more variables if you want to avoid it.
Omitted variable bias is common in an observational study. So if a researcher is conducting a test that uses random assignment, omitted variable bias is not likely to occur. This is because random assignment reduces the effect of confounding variables by dispersing them across the study groups.
However, if you suspect that an ignored or confounding variable might cause an omitted variable bias, you can test for an omitted variable bias to detect this specific variable. Especially, if you have an instrumental variable in your study.
Another way to detect index animated variable bias is to examine this theory and check other studies. So some questions or theories you must ask yourself are:
If you are unable to properly answer these questions, you can consult other pre-existing studies or contact experts for their opinion.
You, as the researcher, however, should know that there is no way to determine if your regression analysis suffers from omitted variable bias by merely looking at the data used in the regression analysis.
If there are omitted variables in research, then what are the effects or consequences of these variables?
Having an omitted variable in research can bias the estimated outcome of the study and lead the researcher to an erroneous conclusion. This means that, while the researcher assesses the effects of the independent variable, the bias can produce other problems in the regression analysis.
Some of these problems are:
This is why the researcher has to be extremely careful because any of these issues can affect their research findings and affect the findings of the study.
To avoid omitted variable bias, before the researcher commences the study, the researcher should get adequate background knowledge as much as possible. The researcher should collect information about the study area, review all existing literature and publications. They also contact experts for information.
Following all these processes will enable the researcher to identify and even measure possible confounding variables that should be included in the research model. In some experiments, the researcher can even develop control variables to combat confounding variables.
Doing all of these will help the researcher to avoid the probable issues that may arise in the first place. This is because no researcher would want to gather all data and then realize that the critical variable was not even measured. It’ll be an enormous setback.
After conducting the analysis, the background knowledge or information gathered by the researcher can help to identify possible biases and determine the appropriate solution if necessary.
Therefore, researchers should check the residual plots, because sometimes it may be unclear whether bias exists. However, checking the residual plots can display any confounding variables’ hallmarks clearly.
Sometimes the omitted variable bias might not be a serious problem because omitted variable bias decreases as the degree of correlation between these variables decrease too. So the researcher can avoid omitted variable bias by understanding the association between the variables in the research model and the confounding variables.
The researcher should take note that there might be a clash between the specification of estimate and the variable bias. So as you are hiding, adding confounding variables to decrease the bias, do not lose focus on the specification of the estimate.
A good way to track the specification of estimates is to check intervals in the coefficient estimates. The estimate appears less precise if the confidence interval becomes larger. When this happens, the researcher might have to accept a few biases if this bias improves the research significantly.
In this article, it has been extensively explained how omitted variable bias can cause erroneous conclusions by the researcher. Researchers should carefully assess their study findings to determine whether the variables correlate with the estimated coefficient.
Also, before conducting a test, estimate or prepare for confounding variables. Because till now, there are no statistical methods available to test for omitted variable bias in a study.
You may also like:
In this article, we will discuss the concept of internal validity, some clear examples, its importance, and how to test it.
In this article, we’ll discuss what a lurking variable means, the several types available, its effects along with some real-life examples
In this article, we are going to look at Simpson’s Paradox from its historical point and later, we’ll consider its effect in...
In this article, we are going to discuss extraneous variables and how they impact research.