Experimental Design




Experimental Validity[edit]

Experiments represent one method for testing a hypothesis that phenomenon A causes a change in phenomenon B. The experimenter will seek to test this causal relationship by creating phenomenon A (or observing phenomenon A in the real world) and then looking for the expected change in phenomenon B. To be scientifically persuasive, an experiment must have internal validity (the perceived changes in phenomenon B are caused by the experimental intervention); it must have construct validity (the experimental intervention must replicate phenomenon A and nothing else); and it must have external validity (the causal connection we witness in the experimental context must be generalizable to the outside world).

Internal Validity[edit]

An experiment possesses internal validity if we feel comfortable that any changes in our dependent variable are actually a result of our experimental "treatment". Threats to internal validity fall into two broad categories: time threats and group threats.

Time Threats[edit]

As the name suggests, time threats to internal validity have to do with the passage of time. There are three types of time threats to the internal validity of experiments:

  1. History - events external to the experiment cause the perceived change. For example, imagine a researcher who is conducting a long-term experiment on the effect of negative campaign ads on trust in government. He's conducting the experiment by measuring trust in government at various points during the course of a negatively charged congressional election. In the midst of his experiment, a major scandal involving the president hits the news. The researcher may measure a decline in trust in government, but that decline may be due to the White House scandal rather than the negative campaign ads.
  2. Maturation - the maturation and aging of the subjects causes the perceived changes. For example, imagine a researcher who is conducting a long-term experiment on approval of the president changes over the course of a presidential term. Because a presidential term is four years, the researcher (a college professor) starts with a group of incoming college freshmen, most of whom are 18 years old. He tracks their approval of the president during the four years of the president's term and discovers that their approval generally declined over this time period. That change, however, may be the result of 18-year-olds growing up and, during the course of their college career, becoming more cynical.
  3. Instrumentation - changes in the way data is collected cause the perceived change. For example, imagine a researcher who is conducting a study on how the tenure of an African-American president affects individuals' perceptions of race relations. He starts by sitting down with groups of subjects to discuss their attitudes toward members of other racial and ethnic groups. Midway through the four-year experiment, he suffers a funding cut, so he can no longer afford to pay subjects mileage to his research center. As a result, he switches his method and begins sending the subjects an online survey. He finds that subjects seem much more hostile toward members of other racial groups toward the end of the experiment than they did at the beginning. There is no way to determine, however, whether this change has to do with the tenure of the president or with the fact that subjects feel more anonymous when they answer the online survey and are thus more willing to express racist beliefs.

Group Threats[edit]

Group threats are threats to internal validity that are based on the selection and identity of the subjects who participate in the experiment. There are four types of group threats to internal validity:

  1. Regression toward the mean - if you assign individuals to the treatment or control group based on their scores on some sort of pretest, you are likely to witness an apparent treatment effect simply because extreme scores tend to be moderated in repeated attempts. This idea of moderation is known as "regression toward the mean." If an individual takes an IQ test many times, they are unlikely to make the exact same score on each attempt. Statistically speaking, a person who scores exceptionally high on a first attempt will score lower (moving toward the average) on a second and third attempt. Similarly, a person who scores exceptionally low on a first attempt will likely score higher (moving toward the average) on a second and third attempt. This phenomenon can affect experimental results if researchers choose people who are extreme on some measure to participate in their study. For example, imagine a researcher who is interested in the effect of negative campaign ads on anxiety levels. In order to observe stronger effects, the researcher chooses subjects who score in the top ten percent on a psychological test of anxiety. The researcher then shows these subjects negative campaign ads and measures their anxiety levels. She discovers that their anxiety levels seemed to have dropped. This is not terribly surprising, but it has nothing to do with the campaign ads. Rather, we would expect people with very high scores on a test of anxiety to score lower on subsequent tests.
  2. Selection based on confounding variable - the difference between the control and treatment group is based on some difference between the groups rather than the treatment itself. For example, imagine a researcher who is interested in how negative campaign ads affect more general trust in government. The researcher will show one group of people (the treatment group) a series of negative campaign ads and then evaluate both the treatment group and the control group on measures of trust in government. For simplicity's sake, the researcher (a professor) decides to use his introductory course as the control group and his senior-level class for the treatment group. He discovers that the treatment group (who saw the negative campaign ads) have lower trust in government. It may be that the ads themselves caused the difference between the two groups. Or, however, it could be that older, more seasoned students are simply more cynical.
  3. Selection-maturation - the selection of subjects for the treatment and control group affects the rate of maturation for the two groups. For example, imagine our scholar who is interested in the effect of campaign ads on trust in government. She starts with a group of introductory American politics students (all of whom should be roughly "the same" at the beginning of the study. Because she will have more access to them, she puts all political science majors in the treatment group and all non-political science majors in the control group. As time passes, we will expect the students to mature in their thinking about the government. But we might expect the political science majors to mature at a faster rate (because they are taking courses in political science and thinking about political issues with more regularity).
  4. Mortality - if individuals drop out of the study in some systematic way, it may affect your results. For example, let's return once again to our scholar who is interested in the effect of negative campaign ads on anxiety levels. Subjects who watch the ads may, in fact, become so anxious that they leave the study altogether (refuse to continue participating). The researcher, then, will have no post-test data on these people. Instead, the subjects who are left may be the ones for whom anxiety levels did not rise. Her results will indicate that there is no effect between negative ads and anxiety, when in fact there may be such a relationship.

Construct Validity[edit]

Let us assume that the only factors affecting the observed behavior of our subjects are factors directly related to the experiment. We're not quite out of the woods yet. Ultimately, we want to be able to say that a particular aspect of the experimental experience causes the behavior we are studying. Yet there may be other aspects of the experience that are the real driving force behind the changes we witness. Construct validity refers to our certainty that we're studying exactly what we think we're studying.

Perhaps the best known example of a threat to construct validity is the placebo effect. Imagine a researcher who wants to determine whether a new drug will lower blood pressure. He assigns subjects to either a treatment group (which receives the drug) or a control group (which receives nothing). He finds that, after a month, the treatment group exhibits lower blood pressure. He concludes that his drug works.


Wrong. It's possible that another facet of the treatment--the simple act of taking a pill--lowers subjects' blood pressure. How can we fix this potential problem? We can give the control group a pill, too. Instead of giving them the drug (whose effectiveness we are studying), we can give them a sugar pill. It has no therapeutic benefit, but it gives the members of the control group the same comforting experience of taking a pill.

In the context of social science research, we see similar threats to construct validity. Imagine a researcher who is interested in the effects of watching television news about the environment on individual attitudes toward recycling. If the treatment group watches a news story about the environment while the control group does nothing at all, we cannot be certain that observed effects are based on the content of the news story; instead, they may be the result of watching the news more generally or of watching television. To control for this--and thus increase construct validity--the researcher could show the control group another TV news story that has nothing to do with the environment. Then, the only thing that would vary between the treatment and control group would be the content of the news story. Any differences between the groups could be attributed to that difference in content.

Construct validity is not just a function of adequate control. Construct validity also requires that the measures we are using actually measure what we think they're measuring. For example, imagine a researcher who is interested in the effect of anxiety on the ability to make political judgments. He attempts to create "anxiety" in subjects by having them play a difficult computer game (which is rigged to ensure they lose). Does his study have construct validity? It's hard to say. He's assuming that playing a difficult computer game induces anxiety (as opposed to frustration or anger). Perhaps he is correct about the effect of playing the video game, but perhaps he is not. In other words, concerns about construct validity implicate concerns about operationalization and measurement of variables.

There are many ways in which the very act of studying a group of people may introduce new factors into the equation of their behavior. Ultimately, most individuals like to please people in positions of authority. As a result, if experimental subjects figure out what researchers are expecting to see, they will try to provide it for them. On the other side of the coin, researchers often have a vested interest in the results of their studies and they expect their hypotheses to be confirmed. As they observe the subjects of their studies, they may see what they expect to see. To compensate for these social threats to construct validity, researchers have developed a number of tools: deceiving subjects about the true objective of a study; using third parties to collect data; and even using a computer to administer the experiment.

Perhaps the most common (and best-known) mechanism for minimizing the effect of subject and experimenter expectations on study results is the double-blind study. In a double-blind study, the subjects do not know whether they are in the treatment or control group, and neither does the researcher (the information about who is in which group is kept in a sealed file until the conclusion of the study.

External Validity[edit]

Like other types of social science inquiry, experiments are done with a purpose. The researcher hopes to move beyond simply describing what happened in the particular instance of the experiment to actually making inferences about the "real world" beyond the experimental setting. That ability to generalize the results of an experiment to the broader world is called external validity. There are three types of threats to external validity:

  1. Setting - In order to maintain control--to prevent extraneous factors from affecting results--researchers will often conduct experiments in a laboratory or under intensely regimented conditions. These efforts at keeping control over the factors that vary from subject to subject can undermine what we call "mundane realism"; in other words, the context in which the experiment takes place is artificial and not like the real world. Sometimes, the way in which people behave in these artificial settings differs from the way they behave in the real world. For example, let's look back at our researcher who is interested in the effects of campaign ads on levels of anxiety. A subject who watches a negative campaign ad in a sterile white room while sitting in a hard plastic chair may respond differently to that ad than will a person who is sitting in their cozy den, in their favorite easy chair, with a cat curled up in his lap.
  2. History - Sometimes we cannot generalize our results beyond a particular point in time. For example, imagine the political communications researcher who is interested in whether people pay more attention to television news stories that are labeled "breaking news." If that researcher has the great misfortune of conducting her research in late September of 2001, right after the terrorist attacks on the New York World Trade Center, she will likely find that, yes, people pay a great deal of attention to "breaking news." However, her results may be "time bound" -- an artifact of that moment in history.
  3. Selection - Sometimes the subjects that a researcher studies differ in meaningful ways from the population to which he would like to generalize his results. For social science researchers, this is a particular problem. Many researchers who work in academic settings will conduct experiments on the most readily available group of people: college students. Most of these researchers would like to generalize their results to the broader population. For example, imagine a researcher who is interested in how news coverage of political events affects political knowledge. She conducts and experiment on college students and discovers that the longer the news story, the more factual information the person retains. This finding may be true of college students--who have a particular skill at sitting through a lecture and retaining information and who have the necessary vocabulary to follow a detailed news account--but it may not be true of the broader population of American citizens.









<references group=""></references>

Discussion questions[edit]



  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]