## Tests of differences of means and proportions

### From OPOSSEM

## Contents

# Objectives[edit]

# Introduction[edit]

One of the primary goals research is to examine the relationships between variables, usually with the goal of determining causal relationships. Often, a first step is to determine whether two groups have statistically different means across some variable of interest. **Note:** just because the sample means for two groups are *literally* different does not mean that they are *statistically* different in the population as a whole. The numerical difference may be an artifact of sampling, which is why a difference of means test is required.

The possible outcomes for difference of means tests is that the means of two groups *are* statistically different or *are not* statistically different.

The **Research (or Alternative) Hypothesis** posits that there **is** a statistically significant difference between the means of the two groups. The **Null Hypothesis** posits that there **is not** a statistically significant difference between the means of the two groups.

If there *is* a statistical difference, then we say that we **reject** the *null hypothesis* and **accept** the *alternative hypothesis.* Conversely, if there *is not* a statistical difference, then we say that we **accept** the *null hypothesis* and **reject** the *alternative hypothesis.*

## Type I and Type II Errors[edit]

XXX Consider fearless deletion in favor of the Testing Hypotheses discussion (which should be a separate page).

Two types of error are possible when conducting statistical tests such as difference of means tests.

- A
**Type I Error**is__rejecting__the*null hypothesis*when it is, in fact, true and__accepting__the*alternative hypothesis*when it is, in fact, false. This is also known as a**false positive**; that is, incorrectly finding a difference of means when there really is not one.

- A
**Type II Error**is__accepting__the*null hypothesis*when it is, in fact, false and__rejecting__the*alternative hypothesis*when it is, in fact, true. This is also known as a**false negative**; that is, failing to find a statistical difference in means when there really is one.

**See also:**Testing Hypotheses

# Independent samples (unpaired) t test[edit]

The independent samples t test (or unpaired difference of means test) is used to compare the population means of a dependent variable between two groups (or categories of an independent variable); for example, we might compare the average "feeling thermometer" rating of a politician between men and women, or the GDP growth rates of democratic and non-democratic countries.

The test statistic for the test is given by:

<math> t = \frac{\bar{y}_2 - \bar{y}_1}{s_{\bar{y}_2-\bar{y}_1}}</math>, where <math>s_{\bar{y}_2-\bar{y}_1} = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}\sqrt{{\frac{1}{n_1}}+{\frac{1}{n_2}}}</math> with <math>\text{df} = n_1+n_2-2</math>

where the subscripts 1 and 2 represent the two groups; <math>y</math> is the variable being compared across the groups (dependent variable), and <math>s</math> is the standard deviation of that variable.

As always, the test has three steps:

- Determine the obtained value of the test statistic, <math>t_\text{ob}</math>.
- Find the critical value of the test statistic, <math>t_\text{crit}</math>, with <math>n_1 + n_2 - 2</math> degrees of freedom.
- Compare the obtained and critical values of the test statistic:
- If <math>|t_\text{ob}| < t_\text{crit}</math>, we fail to reject the null hypothesis and conclude that we do not have sufficient evidence that the two means are different in the population.
- If <math>|t_\text{ob}| \geq t_\text{crit}</math>, we reject the null hypothesis and conclude that there is a difference between the two population means.

## Example[edit]

In a survey of 426 Canadian voters, respondents were asked to rate Prime Minister Stephen Harper on a "feeling thermometer" scale of 0 to 100. A researcher wishes to compare the evaluations between French-speakers (Francophones) and English-speakers (Anglophones). Among the 120 respondents who speak French as their first language, the mean rating of Harper was 45 with a standard deviation of 20, while among the 306 English-speaking respondents the mean rating of Harper was 52 with a standard deviation of 24. Can we be confident, at a 95% confidence level, that French speakers have a lower average rating of Harper than English speakers?

XXX Work example

## Extension to more than two groups[edit]

For comparisons across more than two groups (or categories of the independent variable), you can use either ANOVA or linear regression.

# Paired samples t test[edit]

The paired samples t test is used to compare the values of a variable before and after an event, most commonly based an experimental design. The test statistic is given by:

- <math>t_\text{ob} = \frac{\overline{y}_D}{s_D/\sqrt{n}}. </math>

where <math>\bar{y}_D</math> is the mean of the differences between the pre- and post-test observations, and <math>s_D</math> is the standard deviation of that variable.

As always, the test has three steps:

- Determine the obtained value of the test statistic, <math>t_\text{ob}</math>.
- Find the critical value of the test statistic, <math>t_\text{crit}</math>, with <math>n - 1</math> degrees of freedom.
- Compare the obtained and critical values of the test statistic:
- If <math>|t_\text{ob}| < t_\text{crit}</math>, we fail to reject the null hypothesis and conclude that we do not have sufficient evidence that the two means are different in the population.
- If <math>|t_\text{ob}| \geq t_\text{crit}</math>, we reject the null hypothesis and conclude that there is a difference between the two population means.

## Example[edit]

XXX Cite similar research in Iyengar

A researcher is interested in seeing if negative campaign ads are effective in reducing political support for their targets. Blah blah blah.

## Beyond two waves[edit]

If testing for differences beyond a simple "pre-test, post-test" design (for example, for multiple stages of interventions), an appropriate technique would be Repeated measures ANOVA.

# The difference of proportions test[edit]

If we have a nominal or ordinal outcome variable, the difference of means test is not appropriate. Instead, we can use the *difference of proportions test* to identify whether the proportion of respondents falling into a particular category of the outcome variable is the same between two distinct groups. The formula for the test statistic is:

<math>Z_\text{ob} = \frac{p_2 - p_1}{s_{p_2-p_1}}</math>, where <math>s_{p_2-p_1} = \sqrt{\frac{(p_1 n_1 + p_2 n_2)((1-p_1) n_1 + (1-p_2) n_2)}{n_1 + n_2}({\frac{1}{n_1}}+{\frac{1}{n_2}})}</math>

where <math>p_1</math> and <math>p_2</math> are the proportions in each category and <math>n_1</math> and <math>n_2</math> are the sample sizes.

Note that this test uses the normal (Z) distribution, rather than the t distribution, as it is only valid for large samples.

## Example[edit]

A recent survey of Irish voters conducted by the Irish Times [1] reports that, among the 510 respondents who are not undecided, 58% will support ratification of the European Fiscal Compact in the upcoming referendum while 42% plan to oppose the referendum. Can we be confident that there are more supporters than opponents of ratification among the mass public?

XXX Work example.

## Alternatives to the difference of proportions test[edit]

Nominal and ordinal differences of proportions can also be tested using the chi-square test, particularly when there are more than two groups or more than two categories of the dependent variable.

In small samples, an exact test is more appropriate. XXX Explain this further.

# Conclusion[edit]

# References[edit]

# Discussion questions[edit]

# Problems[edit]

# Glossary[edit]

- [[Def: ]]
- [[Def: ]]
- [[Def: ]]