What is a regression analysis?
A regression analysis is a statistical tool used to identify relationships between variables. For example, if a researcher wanted to determine if there was a relationship with people with blue hair and a predisposition for eating cookies, she would run a regression analysis to see if these two traits are correlated.
How do you run a regression analysis?
A researcher would identify a sample of cookie eaters with various hair colors. She would collect data on both their hair color and how many cookies they ate, as well as potential confounds (more on that below). By segmenting the population by hair color, cookie eating habits, and as many other confounding variables as she can identify, she can get a picture of whether those with blue hair consume more cookies than those with other hair colors.
What if there are several variables at work?
A regression analysis works by eliminating the other possible causes of the correlation. When looking for a relationship between blue hair and cookie eating, a researcher should consider all the potential factors influencing cookie eating habits. Some cookie fans may eat more food in general, have more money with which to buy cookies, or have access to better cookie recipes.
Those variables are called “confounds,” and a regression analysis allows a researcher to rule out confounding variables during the comparison. if some cookie eaters are also employees of a cookie factory and receive free cookies at work, the researcher would hold constant the number of cookies eaten during work hours.
What about collinearity?
Collinearity occurs when variables within a regression are so highly correlated that it becomes impossible to separate the effects. A regression analysis that includes collinear varables would make it impossible to interpret the effects any of the factors individually. For example, overall food consumption and cookie consumption back at the cookie factory are highly correlated; someone who eats more food is highly likely to also eat more cookies at work, regardless of hair color. The researcher should remove either overall food consumption or cookie consumption at work to get a true picture of cookie eating by hair color.
What can we conclude?
A proper regression analysis can confirm that two variables have a positive, negative, or neutral correlation. If we run a regression analysis and remove all of the factors potentially influencing the relationship and we still see that people with blue hair are eating more cookies, we can conclude that blue hair is positively correlated with more cookie consumption. However, with a regression analysis, we cannot say that having blue hair causes an increase in cookie consumption — only that the two are correlated.
In education, controlling for cofounding variables is essential. For example, a researcher investigating how Variable X affects student performance on a test should consider previous performance by those students. If she only compares Variable X to test scores, she is not taking into account the fact that some students may have started out ahead of others or have better test-taking skills in general. A regression analysis allows her to group students with similar skill levels and previous achievements in order to hold constant those confounding variables.
Colliniearity also is essential to check for. For example, in a comparison of students’ accuracy in solving word problems and their grades in the class, a researcher should be careful not to run a regression for both previous performance and performance on word problems, as they are likely highly correlated.
Need more education research? Have questions about regression analyses? Tweet us @ReasoningMind and like us on Facebook. Our next post covers Quasi-Experimental Set-ups, so be sure to keep us bookmarked!