Revision as of 11:27, 18 April 2012 by Jroy (talk | contribs) (Correlation)



In the Research Design Chapter we discussed three general types of research: exploratory, descriptive, and explanatory studies. While the first two seek to describe aspects of the world, the last seeks to explain them. As social scientists we frequently see perplexing events and ask ourselves “Why did that happen?” Or sometimes, “Why did it happen that way?” Conversely, we will sometimes ask, “What are the results of that?” In all these questions we are getting into issues of causality. We can look at the foreign aids which some nations receive and ask, “Why do some nations receive more foreign aid than others?” Alternatively we can ask, “How does receiving large amounts of foreign aid affect the ability of nations to govern?” Both of these questions ask us to determine causality.

Modeling the relationship

In regular English we talk about cause and effect. In statistics we talk about variables, though, so we use different language. We call both the cause and the effect variables. But the variable which we think of as the cause, we call the independent variable while the variable we think of as the effect, we call the dependent variable. We frequently draw a word picture of the relationship which we call a model:

Independent Variable  Dependent Variable

When we ask why some nations receive more foreign aid, foreign aid is the dependent variable. If we think that US allies are more likely to receive aid, we could model the relationship as:

Poverty  Foreign Aid

When we about the affect foreign aid has on governance, foreign aid becomes the independent variable. We would model this as:

Foreign Aid  Poor Governance

If you are conducting an explanatory study it is very helpful to make this kind of model because it forces you to think in terms of what is the cause and what is the effect. Sometimes what you thought of as your topic will belong on the left as the dependent variable. Suppose you begin with a topic of “war.” You will normally refine these topics into research questions such as “Why are some countries more likely to go to war than others?” As you conduct your literature review, you’ll find that your schools of thought belong on the left as independent variables. The literature will talk about many potential causes of war, but maybe the explanation you found most powerful stated that repression by the state leads to internal conflict and war. In this case you would model the relationship as:

Repression  War

But sometimes your topic will belong on the left as the independent variable. Suppose you initially choose to the topic “poverty” to study. As you read the literature, you narrow your focus down to the research question: “How does poverty affect an individual’s ability to get an education?” In this case you have already identified education as your dependent variable. You would model this relationship as:

Poverty  Education

Note that the arrow only implies that there is a relationship between the two variables where the first affects the second. It does not explain what kind of relationship is found between the two variables. In this case, the model doesn’t distinguish between a positive relationship where increasing poverty leads to better education, or a negative relationship where increasing poverty leads to worse education. You’ll clarify the direction of the relationship in your hypothesis.


If you thinking about the causal relationship between your independent variable (the cause) and your dependent variable (the effect), you should check for three things: empirical correlation, time sequencing, and non-spuriousness.


At the heart of the notion of causation is an empirical event: Changes in one variable are associated with changes in the other. Poor countries receive a lot of foreign aid while rich countries receive very little foreign aid. Few children from poor families attend college while most children from rich families do attend college.

Be careful, though. Correlation does not prove causation. There are lots of reasons that two variables could be correlated empirically without having a causal relationship.

Error creating thumbnail: Unable to save thumbnail to destination

Comic credit:

Time Sequencing

Second make sure that your independent variable doesn’t occur after your dependent variable. It would make sense to say that the income of parents affects a child’s ability to attend college. You would model this as:

Parent income  Child College

The independent variable comes first; the dependent, second. But it wouldn’t make sense to say that a child’s income affects a parent’s level of education unless, of course, a rich child pays for their parent to go back to school later.


In a spurious relationship the observed correlation could have an alternative explanation. It could be due to chance; there could be a third variable which is affecting both (an antecedent variable); or there could be a third variable which is caused by the first, but is actually the main cause of the second(an intervening variable).

For example, there may be flooding in the Midwest at the same time that there is a drought in the South, but that correlation is due to chance. Or, owning a car may be highly correlated to having a good high school GPA, but it's the parents income which is an antecedent variable causing both. Finally, rich people may be healthier than poor, but the money per se doesn't lead to health. Rather, wealth leads to the intervening variable of good health care which is the proximate cause of good health.



<references group=""></references>

Discussion questions



  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]