Actions

Single sample tests of means and proportions

From OPOSSEM


Objectives[edit]

Introduction[edit]

Sometimes social scientists want to be able to test a hypothesis comparing the mean of a sample to some "ideal" mean. For example, a local government might want to see if, on average, drivers are obeying a 25 mph (40 km/h) speed limit in a school zone, or the State Department of Motor Vehicles might want to ensure that the average wait time for driver's license applicants at various offices is less than 15 minutes.

In this module we introduce statistical hypothesis testing by extending the concept of the confidence interval to consider these problems.

The Z test[edit]

The Z test for the mean tests the null hypothesis that the population mean μ is equal to some specific test value, μ0, or:

<math>H_0 : \mu = \mu_0</math>

The test statistic Z is then given by the following formula:


<math>Z = \frac {\bar{y}-\mu_0}{\sigma_{\mu_y}}</math>, where <math>\mu_0 </math> is the test value.


where the standard error of the population mean is given by the following (introduced with confidence intervals):

<math>\sigma_{\mu_y} = \frac{\sigma_y}{\sqrt{n}}</math>


The hypothesis testing procedure for the single-sample Z test is as follows:

  1. Identify the value of μ0 you wish to test.
  2. Find Z using the equation above.
  3. Identify the critical value of the normal statistic, <math>Z_\text{crit}</math>, for the confidence level you want to use, from the standard normal table.
  4. Compare the critical value with the Z found using the formula.
    • If <math>Z > Z_\text{crit}</math>, we can reject the null hypothesis and conclude that there is a statistically-significant difference between the true population mean μ and the test value μ0; in other words, we are confident that μ is not the same as μ0.
    • If <math>Z \leq Z_\text{crit}</math>, we fail to reject the null hypothesis and are unable to conclude that μ and μ0 are different.

We will follow this procedure in the example below:

Example: The university administration is concerned that students are spending an inordinate amount of time at the end of semester waiting in line at the campus bookstore to sell back their books, and sets a benchmark wait time of no more than 6 minutes on average. They decide to monitor the situation by measuring the average wait time one day during the buy-back period, and find that 212 students waited in line for an average (mean) of 8 minutes. The population standard deviation is assumed to be 3 minutes. Assuming this day is typical, can the administration be confident, at the 95% confidence level, that the average wait time is exceeded throughout the buy-back period?

The t test[edit]

Just as in the section of confidence intervals, we encounter the problem that it is unrealistic in most circumstances to believe that we would know the population standard deviation without also knowing the population mean; if we knew both quantities already, there would be no need to engage in statistical inference to begin with! But, as William Gosset found, we cannot simply use the sample standard deviation as a stand-in for the population standard deviation. So again we use his Student's t distribution to adjust for the error inherent in using the sample standard deviation.

We are still testing the same hypothesis as we did in the Z test: <math>H_0 : \mu = \mu_0</math>.

The test statistic t is given by the t test formula:

<math>t = \frac {\bar{y}-\mu_0}{s_\bar{y}}</math>, where <math>\mu_0 </math> is the test value and <math>\text{df} = n-1</math>.



Again, note that we need to find the appropriate number of degrees of freedom when using the t distribution. In this case, as with confidence intervals, we use <math>n-1</math> degrees of freedom, where <math>n</math> represents the sample size.

The standard error of the sample mean is again given by:


<math>s_\bar{y} = \frac{s_y}{\sqrt{n}}</math>


The hypothesis testing procedure for the single-sample t test is similar to that of the single-sample Z test; the only substantial differences are that we need to identify the number of degrees of freedom before finding the critical value, and that we need to use the Student's t distribution rather than the standard normal distribution for the test:

  1. Identify the value of μ0 you wish to test.
  2. Find t using the equation above.
  3. Identify the critical value of the normal statistic, <math>t_\text{crit}</math>, for the confidence level you want to use and with the appropriate number of degrees of freedom, <math>n-1</math>, from the t distribution table.
  4. Compare the critical value with the t found using the formula.
    • If <math>t > t_\text{crit}</math>, we can reject the null hypothesis and conclude that there is a statistically-significant difference between the true population mean μ and the test value μ0; in other words, we are confident that μ is not the same as μ0.
    • If <math>t \leq t_\text{crit}</math>, we fail to reject the null hypothesis and are unable to conclude that μ and μ0 are different.

We will follow this procedure in the example below:


Example: A South Carolina state legislator believes that increasing the speed limit on I-95 in the state will increase compliance with the law. She argues that increasing the speed limit to 75 miles per hour (120 km/h) will mean that the average driver will now be obeying the posted speed limit. The state department of transportation conducts a study of traffic on the highway and finds that, based on monitoring the speeds of 421 cars, the mean speed of the cars is 74 mph with a standard deviation of 4 mph. Assuming this is a random sample of I-95 traffic, can we be confident (at the 95% confidence level) that the true average speed of the "population" of all traffic is 75 miles per hour or less?


The Z test for proportions[edit]

In situations involving nominal or ordinal data, as discussed in the section on confidence intervals, we cannot use the mean as a statistic. Instead, we can use a proportions test to arrive at similar conclusions. For example, a researcher interested in the effectiveness of a state university's affirmative action program may want to test whether or not the proportion of non-white students admitted to the university is consistent with the state's percentage of non-white high school graduates.

In the Z test of proportions, the null hypothesis is that the true population proportion π is equal to the test value π0, or:

<math>H_0 : \pi = \pi_0</math>

To test this hypothesis we use the Z test formula for proportions, specified below:


<math>Z = \frac {p-\pi_0}{\sigma_\pi}</math>, where <math>\pi_0 </math> is the test value.


As we learned in the section on confidence intervals, the standard error of the proportion is given by:


<math>\sigma_\pi = \frac{\sqrt{p(1-p)}}{\sqrt{n}}</math> .

Most statistical software packages also include a set of nonparametric binomial tests, which are more accurate than the z test for small samples. These procedures are discussed in the Computing Notes section below.

The relationship between the t and Z tests and confidence intervals[edit]

The single-sample Z and t tests can be thought of as a simple transformation of the construction of confidence intervals, although the purpose of each approach is somewhat different. When constructing confidence intervals we are interested in finding the values of the population parameter (mean or proportion) that are likely to have led to the sample statistic that we observe; the single-sample tests, on the other hand, are concerned with the question of whether it is plausible to for a researcher to believe that a specific value of the population parameter led to the sample statistic we have found. In other words, while confidence intervals are largely exploratory in nature, the single-sample Z and t tests are true hypothesis tests; we are evaluating whether or not a specific hypothesis is true, rather than finding a range of values for which we might find the hypothesis to be true.

Computing Notes[edit]

Most statistical software packages, including SPSS, Stata, and R, when asked for a single-sample test will automatically do a t test, even for large samples; as discussed above, the z test is really only useful when the population mean is known, so is of limited use in practical applications. If for some reason you want to calculate the Z test, you will have to do it by hand, using the formula presented in this chapter.

The single-sample test procedures in these programs are also capable of providing the confidence intervals for the population mean, again using the t distribution.

SPSS[edit]

A single-sample test of means can be obtained in SPSS using the menus via Analyze → Means → One-sample t test.


SPSS does not implement the single-sample test of proportions (at least in the menus). However, the more exact binomial proportions test, discussed above, is available in SPSS using Analyze → Nonparametric Tests → Binomial.


Stata[edit]

Single-sample tests of means are available in Stata using the ttest procedure.


Single-sample tests of proportions are available using prtest (for large sample sizes) and bitest (for small sample sizes).


R[edit]

Single-sample tests of means are available in R using the t.test function.


Single-sample tests of proportions are available using prop.test (for large sample sizes) and binom.test (for small sample sizes).


Conclusion[edit]

References[edit]

<references group=""></references>

Discussion questions[edit]

Problems[edit]

Glossary[edit]

  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]