Actions

Difference between revisions of "Single sample tests of means and proportions"

From OPOSSEM

(Introduction)
(Introduction)
Line 1: Line 1:
<!-- add any hidden notes here -->
+
<p><!-- add any hidden notes here -->
 
+
</p><p><br />
 
+
</p>
=Objectives=
+
<h1>Objectives</h1>
*
+
<ul><li>
*
+
</li><li>
*
+
</li><li>
*
+
</li><li>
 
+
</li></ul>
=Introduction=
+
<h1>Introduction</h1>
 
+
<p>Sometimes social scientists want to be able to test a hypothesis comparing the mean of a sample to some "ideal" mean.  For example, a local government might want to see if, on average, drivers are obeying a 25 mph (40 km/h) speed limit in a school zone, or the State Department of Motor Vehicles might want to ensure that the average wait time for driver's license applicants at various offices is less than 15 minutes.
Sometimes social scientists want to be able to test a hypothesis comparing the mean of a sample to some "ideal" mean.  For example, a local government might want to see if, on average, drivers are obeying a 25 mph (40 km/h) speed limit in a school zone, or the State Department of Motor Vehicles might want to ensure that the average wait time for driver's license applicants at various offices is less than 15 minutes.
+
</p><p>In this module we introduce statistical <a _fcknotitle="true" href="Hypothesis testing">Hypothesis testing</a> by extending the concept of the <a href="Confidence interval">confidence intervals</a> to consider these problems.
 
+
</p>
In this module we introduce statistical [[hypothesis testing]] by extending the concept of the [[confidence interval|confidence intervals]] to consider these problems.
+
<h1>The Z test</h1>
 
+
<p><span class="fck_mw_template">{{Equation:Z test of mean}}</span>
=The Z test=
+
</p><p><b>Example:</b> The university administration is concerned that students are spending an inordinate amount of time at the end of semester waiting in line at the campus bookstore to sell back their books, and sets a benchmark wait time of no more than 6 minutes on average.  They decide to monitor the situation by measuring the average wait time one day during the buy-back period, and find that 212 students waited in line for an average (mean) of 8 minutes.  The population standard deviation is assumed to be 3 minutes.  Assuming this day is typical, can the administration be confident, at the 95% confidence level, that the average wait time is exceeded throughout the buy-back period?
 
+
</p>
{{Equation:Z test of mean}}
+
<h1>The t test</h1>
 
+
<p>Just as in the section of <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a>, we encounter the problem that it is unrealistic in most circumstances to believe that we would know the population standard deviation without also knowing the population mean; if we knew both quantities already, there would be no need to engage in statistical inference to begin with!  But, as William Gosset found, we cannot simply use the sample standard deviation as a stand-in for the population standard deviation.  So again we use his Student's t distribution to  
'''Example:''' The university administration is concerned that students are spending an inordinate amount of time at the end of semester waiting in line at the campus bookstore to sell back their books, and sets a benchmark wait time of no more than 6 minutes on average.  They decide to monitor the situation by measuring the average wait time one day during the buy-back period, and find that 212 students waited in line for an average (mean) of 8 minutes.  The population standard deviation is assumed to be 3 minutes.  Assuming this day is typical, can the administration be confident, at the 95% confidence level, that the average wait time is exceeded throughout the buy-back period?
+
</p><p><span class="fck_mw_template">{{Equation:T test of mean}}</span>
 
+
</p><p><b>Example:</b> A South Carolina state legislator believes that increasing the speed limit on I-95 in the state will increase compliance with the law.  She argues that increasing the speed limit to 75 miles per hour (120 km/h) will mean that the average driver will now be obeying the posted speed limit.  The state department of transportation conducts a study of traffic on the highway and finds that, based on monitoring the speeds of 421 cars, the mean speed of the cars is 74 mph with a standard deviation of 4 mph.  Assuming this is a random sample of I-95 traffic, can we be confident (at the 95% confidence level) that the true average speed of the "population" of all traffic is 75 miles per hour or less?
=The t test=
 
 
 
Just as in the section of [[confidence intervals]], we encounter the problem that it is unrealistic in most circumstances to believe that we would know the population standard deviation without also knowing the population mean; if we knew both quantities already, there would be no need to engage in statistical inference to begin with!  But, as William Gosset found, we cannot simply use the sample standard deviation as a stand-in for the population standard deviation.  So again we use his Student's t distribution to  
 
 
 
{{Equation:T test of mean}}
 
 
 
'''Example:''' A South Carolina state legislator believes that increasing the speed limit on I-95 in the state will increase compliance with the law.  She argues that increasing the speed limit to 75 miles per hour (120 km/h) will mean that the average driver will now be obeying the posted speed limit.  The state department of transportation conducts a study of traffic on the highway and finds that, based on monitoring the speeds of 421 cars, the mean speed of the cars is 74 mph with a standard deviation of 4 mph.  Assuming this is a random sample of I-95 traffic, can we be confident (at the 95% confidence level) that the true average speed of the "population" of all traffic is 75 miles per hour or less?
 
  
 
<!-- XXX -->
 
<!-- XXX -->
  
=The Z test for proportions=
+
</p>
 
+
<h1>The Z test for proportions</h1>
In situations involving nominal or ordinal data, as discussed in the section on [[confidence intervals]], we cannot use the mean as a statistic.  Instead, we can use a proportions test to arrive at similar conclusions.  For example, a researcher interested in the effectiveness of a state university's affirmative action program may want to test whether or not the proportion of non-white students admitted to the university is consistent with the state's percentage of non-white high school graduates.
+
<p>In situations involving nominal or ordinal data, as discussed in the section on <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a>, we cannot use the mean as a statistic.  Instead, we can use a proportions test to arrive at similar conclusions.  For example, a researcher interested in the effectiveness of a state university's affirmative action program may want to test whether or not the proportion of non-white students admitted to the university is consistent with the state's percentage of non-white high school graduates.
 
+
</p><p>In the Z test of proportions, the null hypothesis is that the true population proportion &pi; is equal to the test value &pi;<sub>0</sub>, or:
In the Z test of proportions, the null hypothesis is that the true population proportion &pi; is equal to the test value &pi;<sub>0</sub>, or:
+
</p><p><span class="texhtml"><i>H</i><sub>0</sub>:&pi; = &pi;<sub>0</sub></span>
 
+
</p><p>To test this hypothesis we use the Z test formula for proportions, specified below:
<math>H_0 : \pi = \pi_0</math>
+
</p><p><span class="fck_mw_template">{{Equation:Z test for proportions}}</span>
 
+
</p><p>As we learned in the section on confidence intervals, the standard error of the proportion is given by <span class="fck_mw_template">{{Equation:Standard error of proportion}}</span>.
To test this hypothesis we use the Z test formula for proportions, specified below:
+
</p><p>Most statistical software packages also include a set of nonparametric binomial tests, which are more accurate than the z test for small samples.  These procedures are discussed in the Computing Notes section below.
 
+
</p>
{{Equation:Z test for proportions}}
+
<h1>The relationship between the t and Z tests and confidence intervals</h1>
 
+
<p>The single-sample Z and t tests can be thought of as a simple transformation of the construction of <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a>, although the purpose of each approach is somewhat different.  When constructing confidence intervals we are interested in finding the values of the population parameter (mean or proportion) that are likely to have led to the sample statistic that we observe; the single-sample tests, on the other hand, are concerned with the question of whether it is plausible to for a researcher to believe that a <i>specific</i> value of the population parameter led to the sample statistic we have found.  In other words, while confidence intervals are largely exploratory in nature, the single-sample Z and t tests are true <i>hypothesis tests</i>; we are evaluating whether or not a specific hypothesis is true, rather than finding a range of values for which we might find the hypothesis to be true.
As we learned in the section on confidence intervals, the standard error of the proportion is given by {{Equation:Standard error of proportion}}.
+
</p>
 
+
<h1>Computing Notes</h1>
Most statistical software packages also include a set of nonparametric binomial tests, which are more accurate than the z test for small samples.  These procedures are discussed in the Computing Notes section below.
+
<p>Most statistical software packages, including SPSS, Stata, and R, when asked for a single-sample test will automatically do a t test, even for large samples; as discussed above, the z test is really only useful when the population mean is known, so is of limited use in practical applications.  If for some reason you want to calculate the Z test, you will have to do it by hand, using the formula presented in this chapter.
 
+
</p><p>The single-sample test procedures in these programs are also capable of providing the <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a> for the population mean, again using the t distribution.
=The relationship between the t and Z tests and confidence intervals=
+
</p>
 
+
<h2>SPSS</h2>
The single-sample Z and t tests can be thought of as a simple transformation of the construction of [[confidence intervals]], although the purpose of each approach is somewhat different.  When constructing confidence intervals we are interested in finding the values of the population parameter (mean or proportion) that are likely to have led to the sample statistic that we observe; the single-sample tests, on the other hand, are concerned with the question of whether it is plausible to for a researcher to believe that a ''specific'' value of the population parameter led to the sample statistic we have found.  In other words, while confidence intervals are largely exploratory in nature, the single-sample Z and t tests are true ''hypothesis tests''; we are evaluating whether or not a specific hypothesis is true, rather than finding a range of values for which we might find the hypothesis to be true.
+
<p>A single-sample test of means can be obtained in SPSS using the menus via Analyze → Means → One-sample t test.
 
 
=Computing Notes=
 
 
 
Most statistical software packages, including SPSS, Stata, and R, when asked for a single-sample test will automatically do a t test, even for large samples; as discussed above, the z test is really only useful when the population mean is known, so is of limited use in practical applications.  If for some reason you want to calculate the Z test, you will have to do it by hand, using the formula presented in this chapter.
 
 
 
The single-sample test procedures in these programs are also capable of providing the [[confidence intervals]] for the population mean, again using the t distribution.
 
 
 
==SPSS==
 
 
 
A single-sample test of means can be obtained in SPSS using the menus via Analyze → Means → One-sample t test.
 
  
 
<!-- XXX Demo? -->
 
<!-- XXX Demo? -->
  
SPSS does not implement the single-sample test of proportions (at least in the menus).  However, the more exact binomial proportions test, discussed above, is available in SPSS using Analyze → Nonparametric Tests → Binomial.
+
</p><p>SPSS does not implement the single-sample test of proportions (at least in the menus).  However, the more exact binomial proportions test, discussed above, is available in SPSS using Analyze → Nonparametric Tests → Binomial.
  
 
<!-- XXX Example? -->
 
<!-- XXX Example? -->
  
==Stata==
+
</p>
 
+
<h2>Stata</h2>
Single-sample tests of means are available in Stata using the <tt>ttest</tt> procedure.
+
<p>Single-sample tests of means are available in Stata using the <tt>ttest</tt> procedure.
  
 
<!-- XXX Example -->
 
<!-- XXX Example -->
  
Single-sample tests of proportions are available using <tt>prtest</tt> (for large sample sizes) and <tt>bitest</tt> (for small sample sizes).
+
</p><p>Single-sample tests of proportions are available using <tt>prtest</tt> (for large sample sizes) and <tt>bitest</tt> (for small sample sizes).
  
 
<!-- XXX Example -->
 
<!-- XXX Example -->
  
==R==
+
</p>
 
+
<h2>R</h2>
Single-sample tests of means are available in R using the <tt>t.test</tt> function.
+
<p>Single-sample tests of means are available in R using the <tt>t.test</tt> function.
  
 
<!-- XXX Example -->
 
<!-- XXX Example -->
  
Single-sample tests of proportions are available using <tt>prop.test</tt> (for large sample sizes) and <tt>binom.test</tt> (for small sample sizes).
+
</p><p>Single-sample tests of proportions are available using <tt>prop.test</tt> (for large sample sizes) and <tt>binom.test</tt> (for small sample sizes).
  
 
<!-- XXX Example -->
 
<!-- XXX Example -->
 +
  
 
<!--
 
<!--
Line 104: Line 88:
 
-->
 
-->
  
=Conclusion=
+
</p>
 
+
<h1>Conclusion</h1>
 +
<p>
 
<!--DO NOT EDIT THE REFERENCE SECTION-->
 
<!--DO NOT EDIT THE REFERENCE SECTION-->
=References=
 
{{Reflist}}
 
 
=Discussion questions=
 
#
 
#
 
#
 
#
 
#
 
 
=Problems=
 
#
 
#
 
#
 
#
 
#
 
  
=Glossary=
+
</p>
 +
<h1>References</h1>
 +
<p><span class="fck_mw_template">{{Reflist}}</span>
 +
</p>
 +
<h1>Discussion questions</h1>
 +
<ol><li>
 +
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li></ol>
 +
<h1>Problems</h1>
 +
<ol><li>
 +
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li></ol>
 +
<p>=Glossary=
 
<!-- Here add any keywords or terms introduced on this page. Add them in a list like:
 
<!-- Here add any keywords or terms introduced on this page. Add them in a list like:
 
:*[[Def:newterm1]]
 
:*[[Def:newterm1]]
Line 130: Line 117:
 
:*[[Def:newterm3]]
 
:*[[Def:newterm3]]
 
Do not edit above this line.-->
 
Do not edit above this line.-->
:*[[Def: ]]
 
:*[[Def: ]]
 
:*[[Def: ]]
 
  
 +
</p>
 +
<dl><dd><ul><li>[[Def: ]]
 +
</li><li>[[Def: ]]
 +
</li><li>[[Def: ]]
 +
</li></ul>
 +
</dd></dl>
 +
<p>
 
<!--Do not edit below this line.-->
 
<!--Do not edit below this line.-->
__FORCETOC__
+
 
 +
</p>
 +
<pre class="_fck_mw_lspace">__FORCETOC__
 +
</pre>

Revision as of 09:15, 9 July 2011


Objectives

Introduction

Sometimes social scientists want to be able to test a hypothesis comparing the mean of a sample to some "ideal" mean. For example, a local government might want to see if, on average, drivers are obeying a 25 mph (40 km/h) speed limit in a school zone, or the State Department of Motor Vehicles might want to ensure that the average wait time for driver's license applicants at various offices is less than 15 minutes.

In this module we introduce statistical <a _fcknotitle="true" href="Hypothesis testing">Hypothesis testing</a> by extending the concept of the <a href="Confidence interval">confidence intervals</a> to consider these problems.

The Z test

<math>Z = \frac {\bar{y}-\mu_0}{\sigma_{\mu_y}}</math>, where <math>\mu_0 </math> is the test value.

Example: The university administration is concerned that students are spending an inordinate amount of time at the end of semester waiting in line at the campus bookstore to sell back their books, and sets a benchmark wait time of no more than 6 minutes on average. They decide to monitor the situation by measuring the average wait time one day during the buy-back period, and find that 212 students waited in line for an average (mean) of 8 minutes. The population standard deviation is assumed to be 3 minutes. Assuming this day is typical, can the administration be confident, at the 95% confidence level, that the average wait time is exceeded throughout the buy-back period?

The t test

Just as in the section of <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a>, we encounter the problem that it is unrealistic in most circumstances to believe that we would know the population standard deviation without also knowing the population mean; if we knew both quantities already, there would be no need to engage in statistical inference to begin with! But, as William Gosset found, we cannot simply use the sample standard deviation as a stand-in for the population standard deviation. So again we use his Student's t distribution to

<math>t = \frac {\bar{y}-\mu_0}{s_\bar{y}}</math>, where <math>\mu_0 </math> is the test value and <math>\text{df} = n-1</math>.


Example: A South Carolina state legislator believes that increasing the speed limit on I-95 in the state will increase compliance with the law. She argues that increasing the speed limit to 75 miles per hour (120 km/h) will mean that the average driver will now be obeying the posted speed limit. The state department of transportation conducts a study of traffic on the highway and finds that, based on monitoring the speeds of 421 cars, the mean speed of the cars is 74 mph with a standard deviation of 4 mph. Assuming this is a random sample of I-95 traffic, can we be confident (at the 95% confidence level) that the true average speed of the "population" of all traffic is 75 miles per hour or less?


The Z test for proportions

In situations involving nominal or ordinal data, as discussed in the section on <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a>, we cannot use the mean as a statistic. Instead, we can use a proportions test to arrive at similar conclusions. For example, a researcher interested in the effectiveness of a state university's affirmative action program may want to test whether or not the proportion of non-white students admitted to the university is consistent with the state's percentage of non-white high school graduates.

In the Z test of proportions, the null hypothesis is that the true population proportion π is equal to the test value π0, or:

H0:π = π0

To test this hypothesis we use the Z test formula for proportions, specified below:

<math>Z = \frac {p-\pi_0}{\sigma_\pi}</math>, where <math>\pi_0 </math> is the test value.

As we learned in the section on confidence intervals, the standard error of the proportion is given by

<math>\sigma_\pi = \frac{\sqrt{p(1-p)}}{\sqrt{n}}</math> .

Most statistical software packages also include a set of nonparametric binomial tests, which are more accurate than the z test for small samples. These procedures are discussed in the Computing Notes section below.

The relationship between the t and Z tests and confidence intervals

The single-sample Z and t tests can be thought of as a simple transformation of the construction of <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a>, although the purpose of each approach is somewhat different. When constructing confidence intervals we are interested in finding the values of the population parameter (mean or proportion) that are likely to have led to the sample statistic that we observe; the single-sample tests, on the other hand, are concerned with the question of whether it is plausible to for a researcher to believe that a specific value of the population parameter led to the sample statistic we have found. In other words, while confidence intervals are largely exploratory in nature, the single-sample Z and t tests are true hypothesis tests; we are evaluating whether or not a specific hypothesis is true, rather than finding a range of values for which we might find the hypothesis to be true.

Computing Notes

Most statistical software packages, including SPSS, Stata, and R, when asked for a single-sample test will automatically do a t test, even for large samples; as discussed above, the z test is really only useful when the population mean is known, so is of limited use in practical applications. If for some reason you want to calculate the Z test, you will have to do it by hand, using the formula presented in this chapter.

The single-sample test procedures in these programs are also capable of providing the <a _fcknotitle="true" href="Confidence intervals">Confidence intervals</a> for the population mean, again using the t distribution.

SPSS

A single-sample test of means can be obtained in SPSS using the menus via Analyze → Means → One-sample t test.

SPSS does not implement the single-sample test of proportions (at least in the menus). However, the more exact binomial proportions test, discussed above, is available in SPSS using Analyze → Nonparametric Tests → Binomial.


Stata

Single-sample tests of means are available in Stata using the ttest procedure.

Single-sample tests of proportions are available using prtest (for large sample sizes) and bitest (for small sample sizes).


R

Single-sample tests of means are available in R using the t.test function.

Single-sample tests of proportions are available using prop.test (for large sample sizes) and binom.test (for small sample sizes).



Conclusion

References

<references group=""></references>

Discussion questions

Problems

=Glossary=

  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]

__FORCETOC__