Actions

Difference between revisions of "Statistical distributions"

From OPOSSEM

m (The F Distribution)
m (The F Distribution)
Line 1: Line 1:
<!--  
+
<p><!--  
 
1. At the moment [11.07.07] I'm just putting the equations in-line, since I'm converting this from LaTeX. I'll move the equations to references later
 
1. At the moment [11.07.07] I'm just putting the equations in-line, since I'm converting this from LaTeX. I'll move the equations to references later
  
 
2. Images courtesy of Chris Zorn
 
2. Images courtesy of Chris Zorn
 
  -->
 
  -->
 
+
</p><p><br />
 
+
</p>
=Objectives=
+
<h1>Objectives</h1>
*
+
<ul><li>
*
+
</li><li>
*
+
</li><li>
*
+
</li><li>
=Introduction=
+
</li></ul>
 
+
<h1>Introduction</h1>
In the previous chapter we discussed probability theory, which we expressed in terms of a variable $X$.  We defined $X$ as a set of realizations of some process, which in turn is governed by rules of probability regarding potential outcomes in the sample space.  
+
<p>In the previous chapter we discussed probability theory, which we expressed in terms of a variable $X$.  We defined $X$ as a set of realizations of some process, which in turn is governed by rules of probability regarding potential outcomes in the sample space.  
 
+
</p><p>The variables we were talking about have been what are called <i>random variables</i>, which means that they have a  probability distribution.  As we noted before, broadly speaking, there are two kinds of random variables: <i>discrete</i> and <i>continuous</i>.
The variables we were talking about have been what are called ''random variables'', which means that they have a  probability distribution.  As we noted before, broadly speaking, there are two kinds of random variables: ''discrete'' and ''continuous''.
+
</p><p><i>Discrete</i> variables can take on any one of several distinct, mutually-exclusive values.
 
+
</p>
''Discrete'' variables can take on any one of several distinct, mutually-exclusive values.
+
<ul><li> Congressperson's ideology score {0,1,2,3...,100}
* Congressperson's ideology score {0,1,2,3...,100}
+
</li><li> An individual's political affiliation (Democrat, Republican, Independent}
* An individual's political affiliation (Democrat, Republican, Independent}
+
</li><li> Whether or not a country is a member of European Union (true/false)
* Whether or not a country is a member of European Union (true/false)
+
</li></ul>
 
+
<p>A <i>Continuous</i> variable can take on <i>any</i> value in its range.
A ''Continuous'' variable can take on ''any'' value in its range.
+
</p>
* Individual income
+
<ul><li> Individual income
* National population
+
</li><li> National population
 
+
</li></ul>
This chapter focuses on a family of continuous distributions that are the most widely used in statistical inference, and are found in a wide variety of contexts, both applied and theoretical. The <math>Normal</math> distribution is the well-known "bell-shaped curve" that most students usually encounter first in the artificial context of academic testing, but due to a powerful result called the Central Limit Theorem, occurs in a wide variety of uncontrolled situations where the value of a random variables is determined by the average effect of a large number of random variables with any combination of distributions. The <math>\chi^{2}</math>, <math>t</math> and <math>F</math> distributions can be derived from various products of normally-distributed variables, and  are used extensively in statistical inference and applied statistics, so it's useful to understand them in a bit of depth.  
+
<p>This chapter focuses on a family of continuous distributions that are the most widely used in statistical inference, and are found in a wide variety of contexts, both applied and theoretical. The <span class="texhtml"><i>N</i><i>o</i><i>r</i><i>m</i><i>a</i><i>l</i></span> distribution is the well-known "bell-shaped curve" that most students usually encounter first in the artificial context of academic testing, but due to a powerful result called the Central Limit Theorem, occurs in a wide variety of uncontrolled situations where the value of a random variables is determined by the average effect of a large number of random variables with any combination of distributions. The <span class="texhtml">&chi;<sup>2</sup></span>, <span class="texhtml"><i>t</i></span> and <span class="texhtml"><i>F</i></span> distributions can be derived from various products of normally-distributed variables, and  are used extensively in statistical inference and applied statistics, so it's useful to understand them in a bit of depth.  
 
+
</p>
==Need to do==
+
<h2>Need to do</h2>
[[User:Philip Schrodt|Philip Schrodt]] 06:57, 13 July 2011 (PDT)
+
<p><a href="User:Philip Schrodt">Philip Schrodt</a> 06:57, 13 July 2011 (PDT)
*Probably need to get most of the probability chapter---which are the moment hasn't been started---written before this one. In particular, will the pdf and cdf be defined there or here?
+
</p>
*Add some of the discrete distributions, particularly the binomial
+
<ul><li>Probably need to get most of the probability chapter---which are the moment hasn't been started---written before this one. In particular, will the pdf and cdf be defined there or here?
*Add the uniform?
+
</li><li>Add some of the discrete distributions, particularly the binomial
*Do we add---or link to on another page---the derivation of the mean and standard errors for these: that code is available in CCL on an assortment of places on the web
+
</li><li>Add the uniform?
 
+
</li><li>Do we add---or link to on another page---the derivation of the mean and standard errors for these: that code is available in CCL on an assortment of places on the web
= The Normal Distribution =
+
</li></ul>
 
+
<h1> The Normal Distribution </h1>
We are all used to seeing normal distributions described, and to hearing that something is "normally distributed."  We know that a normal distribution is "bell-shaped," and symmetrical, and probably that it has some mean and some standard deviation.  
+
<p>We are all used to seeing normal distributions described, and to hearing that something is "normally distributed."  We know that a normal distribution is "bell-shaped," and symmetrical, and probably that it has some mean and some standard deviation.  
 
+
</p><p>Formally, if <span class="texhtml"><i>X</i></span> is a <i>normally distributed</i> variate with mean <span class="texhtml">&mu;</span> and variance <span class="texhtml">&sigma;<sup>2</sup></span>, then:  
Formally, if <math>X</math> is a ''normally distributed'' variate with mean <math>\mu</math> and variance <math>\sigma^{2}</math>, then:  
+
</p><p><img _fckfakelement="true" _fck_mw_math="f(x) = \frac{1}{\sigma \sqrt{2\pi}} \text{exp} \left( - \frac{(x - \mu)^{2}}{2 \sigma^{2}} \right)" src="/images/math/0/5/c/05c01fdab44e6d59e0edc24028e1206a.png" />.
 
+
</p><p><br />
<math>f(x) = \frac{1}{\sigma \sqrt{2\pi}} \text{exp} \left( - \frac{(x - \mu)^{2}}{2 \sigma^{2}} \right)</math>.
+
We denote this <span class="texhtml"><i>X</i>&tilde;<i>N</i>(&mu;,&sigma;<sup>2</sup>)</span>, and say ``<span class="texhtml"><i>X</i></span> is distributed normally with mean mu and variance sigma squared.<i> The symbol <span class="texhtml">&phi;</span> is often used as a shorthand to represent the normal density in \eqref{normalden}:</i>
 
+
</p><p><img _fckfakelement="true" _fck_mw_math="X \sim \phi_{\mu, \sigma^{2}}" src="/images/math/d/6/4/d64347ee6feb5ed546c6c65e3674dfb5.png" />.
 
+
</p><p>The corresponding normal CDF -- which is the probability of a normal random variate taking on a value less than or equal to some specified number -- is (as always) the indefinite integral of \eqref{normalden}.  This has no simple closed-form solution, so we typically just write:
We denote this <math>X \sim N(\mu,\sigma^{2})</math>, and say ``<math>X</math> is distributed normally with mean mu and variance sigma squared.'' The symbol <math>\phi</math> is often used as a shorthand to represent the normal density in \eqref{normalden}:
+
</p><p><img _fckfakelement="true" _fck_mw_math="F(x) \equiv \Phi_{\mu, \sigma^{2}}(x) = \int \phi_{\mu, \sigma^{2}} f(x) d x." src="/images/math/4/b/0/4b02e1a28b9bbc37b9bec48dcc04b239.png" />
 
+
</p><p>Here are a bunch of normal curves
<math>X \sim \phi_{\mu, \sigma^{2}}</math>.
+
</p><p><img src="/images/thumb/5/55/StatDist.Normals.png/512px-StatDist.Normals.png" _fck_mw_filename="StatDist.Normals.png" _fck_mw_width="512" alt="StatDist.Normals.png" />
 
+
</p><p><br />
The corresponding normal CDF -- which is the probability of a normal random variate taking on a value less than or equal to some specified number -- is (as always) the indefinite integral of \eqref{normalden}.  This has no simple closed-form solution, so we typically just write:
+
<img src="/images/thumb/b/b7/StatDist.NormalCDFs.png/512px-StatDist.NormalCDFs.png" _fck_mw_filename="StatDist.NormalCDFs.png" _fck_mw_width="512" alt="Normal cumulative distribution functions" />
 
+
</p>
<math>F(x) \equiv \Phi_{\mu, \sigma^{2}}(x) = \int \phi_{\mu, \sigma^{2}} f(x) d x.</math>
+
<h2> Bases for the Normal Distribution </h2>
 
+
<p>The most common justification for the normal distribution has its roots in the 'central limit theorem'.  Consider <span class="texhtml"><i>i</i> = 1,2,...<i>N</i></span> independent, real-valued random variates $<span class="texhtml"><i>X</i><sub><i>i</i></sub></span>$, each with finite mean $<span class="texhtml">&mu;<sub><i>i</i></sub></span>$ and variance <img _fckfakelement="true" _fck_mw_math="\sigma^{2}_{i} &gt; 0" src="/images/math/8/e/3/8e3e18b02caaa63b535d363869d670c9.png" />.  If we consider a new variable $<span class="texhtml"><i>X</i></span>$ defined as the sum of these variables:
Here are a bunch of normal curves
+
</p><p><img _fckfakelement="true" _fck_mw_math="X = \sum_{i=1}^{N} X_{i}" src="/images/math/a/7/5/a752c37c7aaf9b5d42055a00f9b5fd37.png" />
 
+
</p><p>then we know that
[[File:StatDist.Normals.png | 512px]]
+
</p><p><img _fckfakelement="true" _fck_mw_math=" \text{E}(X) = \sum_{i=1}^{N} \mu_{i} " src="/images/math/d/a/7/da7eb476d715554622bbefea75301103.png" />
 
+
</p><p>and
 
+
</p><p><img _fckfakelement="true" _fck_mw_math=" \text{Var}(X) = \sum_{i=1}^{N} \sigma^{2}_{i} " src="/images/math/7/7/d/77de1add6aafc0249d728571a9683b18.png" />
[[File:StatDist.NormalCDFs.png | 512px | Normal cumulative distribution functions]]
+
</p><p><br />
 
 
== Bases for the Normal Distribution ==
 
 
 
The most common justification for the normal distribution has its roots in the 'central limit theorem'. Consider <math>i = {1,2,...N}</math> independent, real-valued random variates $<math>X_{i}</math>$, each with finite mean $<math>\mu_{i}</math>$ and variance <math>\sigma^{2}_{i} > 0</math>. If we consider a new variable $<math>X</math>$ defined as the sum of these variables:
 
 
 
<math>X = \sum_{i=1}^{N} X_{i}</math>
 
 
 
then we know that
 
 
 
<math> \text{E}(X) = \sum_{i=1}^{N} \mu_{i} </math>
 
 
 
and
 
 
 
<math> \text{Var}(X) = \sum_{i=1}^{N} \sigma^{2}_{i} </math>
 
 
 
 
 
 
The central limit theorem states that:
 
The central limit theorem states that:
 
+
</p><p><img _fckfakelement="true" _fck_mw_math=" \underset{N \rightarrow \infty}{\lim} X = \underset{N \rightarrow \infty}{\lim} \sum_{i=1}^{N} X_{i} \overset{D}{\rightarrow} N(\cdot) " src="/images/math/8/a/1/8a127d6ed5c5e2dd71ebdf34d8682057.png" />
<math>  \underset{N \rightarrow \infty}{\lim} X = \underset{N \rightarrow \infty}{\lim} \sum_{i=1}^{N} X_{i} \overset{D}{\rightarrow} N(\cdot) </math>
+
</p><p>where the notation <img _fckfakelement="true" _fck_mw_math="\overset{D}{\rightarrow}" src="/images/math/0/9/3/0931fec8f6726354023e382d5c71be2c.png" /> indicates convergence in distribution.  That is, as <span class="texhtml"><i>N</i></span> gets sufficiently large, the distribution of the sum of <span class="texhtml"><i>N</i></span> independent random variates with finite mean and variance will converge to a normal distribution.  As such, we often think of a normal distribution as being appropriate when the observed variable <span class="texhtml"><i>X</i></span> can take on a range of continuous values, and when the observed value of <span class="texhtml"><i>X</i></span> can be thought of as the product of a large number of relatively small, independent ``shocks<i> or perturbations.</i>
 
+
</p>
where the notation <math>\overset{D}{\rightarrow}</math> indicates convergence in distribution.  That is, as <math>N</math> gets sufficiently large, the distribution of the sum of <math>N</math> independent random variates with finite mean and variance will converge to a normal distribution.  As such, we often think of a normal distribution as being appropriate when the observed variable <math>X</math> can take on a range of continuous values, and when the observed value of <math>X</math> can be thought of as the product of a large number of relatively small, independent ``shocks'' or perturbations.
+
<h2> Properties of the Normal Distribution </h2>
 
+
<ul><li> A normal variate <span class="texhtml"><i>X</i></span> has support in <img _fckfakelement="true" _fck_mw_math="\mathfrak{R}" src="/images/math/6/1/0/610bc52ec5a62efd154a01deb92a0d5c.png" />.
== Properties of the Normal Distribution ==
+
</li><li> The normal is a two-parameter distribution, where <img _fckfakelement="true" _fck_mw_math="\mu \in (-\infty, \infty)" src="/images/math/8/9/0/8907d5e8f9cb4328f76778e16b69fba7.png" /> and <img _fckfakelement="true" _fck_mw_math="\sigma^{2} \in (0, \infty)" src="/images/math/c/1/0/c100ff23356ea61cfd105a64d8774c53.png" />.
 
+
</li><li> The normal distribution is always symmetrical (<span class="texhtml"><i>M</i><sub>3</sub> = 0</span>) and mesokurtic.  
 
+
</li><li> item The normal distribution is preserved under a linear transformation.  That is, if <span class="texhtml"><i>X</i>&tilde;<i>N</i>(&mu;,&sigma;<sup>2</sup>)</span>, then <span class="texhtml"><i>a</i><i>X</i> + <i>b</i>&tilde;<i>N</i>(<i>a</i>&mu; + <i>b</i>,<i>a</i><sup>2</sup>&sigma;<sup>2</sup>)</span>. (Why?  Recall our earlier results on <span class="texhtml">&mu;</span> and <span class="texhtml">&sigma;<sup>2</sup></span>).  
* A normal variate <math>X</math> has support in <math>\mathfrak{R}</math>.
+
</li></ul>
* The normal is a two-parameter distribution, where <math>\mu \in (-\infty, \infty)</math> and <math>\sigma^{2} \in (0, \infty)</math>.
+
<p><br />
* The normal distribution is always symmetrical (<math>M_{3} = 0</math>) and mesokurtic.  
+
</p>
* item The normal distribution is preserved under a linear transformation.  That is, if <math>X \sim N(\mu,\sigma^{2})</math>, then <math>aX + b \sim N(a\mu + b, a^{2} \sigma^{2})</math>. (Why?  Recall our earlier results on <math>\mu</math> and <math>\sigma^{2}</math>).  
+
<h2> The Standard Normal Distribution </h2>
 
+
<p>One linear transformation is especially useful:  
 
+
</p><p><img _fckfakelement="true" _fck_mw_math="&#10;\begin{align}&#10; b &amp; = \frac{-\mu}{\sigma} \\&#10; a &amp; = \frac{1}{\sigma}&#10;\end{align}&#10;" src="/images/math/8/e/e/8ee8272b17591d286ca783dd0a7b5dd0.png" />.
== The Standard Normal Distribution ==
+
</p><p><br /> 
 
 
One linear transformation is especially useful:  
 
 
 
<math>
 
\begin{align}
 
b & = \frac{-\mu}{\sigma} \\
 
a & = \frac{1}{\sigma}
 
\end{align}
 
</math>.
 
 
 
 
 
 
This yields:
 
This yields:
 
+
</p><p><img _fckfakelement="true" _fck_mw_math="&#10;\begin{align}&#10; ax + b  &amp; \sim  N(a\mu+b, a^{2} \sigma^{2}) \\ &#10;        &amp; \sim N(0,1) &#10;\end{align}&#10;" src="/images/math/0/4/0/040ae3bbba3a7ab522dca56833ad2722.png" />
<math>
+
</p><p>This is the <i>standard normal density function</i>.  We often denote this <img _fckfakelement="true" _fck_mw_math="\phi(\cdot)" src="/images/math/5/2/d/52d16e95602c985d5f23b36ddc663415.png" />, and say that "X is distributed as standard normal."  We can also get this by transforming ("standardizing") the normal variate <span class="texhtml"><i>X</i></span>...
\begin{align}
+
</p>
ax + b  & \sim  N(a\mu+b, a^{2} \sigma^{2}) \\  
+
<ul><li> If <span class="texhtml"><i>X</i>&tilde;<i>N</i>(&mu;,&sigma;<sup>2</sup>)</span>, then <img _fckfakelement="true" _fck_mw_math="Z  =  \frac{(x - \mu)}{\sigma} \sim N(0,1)" src="/images/math/8/1/e/81e060afad46dc085b84bc31bf94454c.png" />.  
        &  \sim N(0,1)  
+
</li><li>The density function then reduces to:
\end{align}
+
</li></ul>
</math>
+
<p><br />
 
+
<img _fckfakelement="true" _fck_mw_math=" f(z) = \equiv \phi(z) = \frac{1}{\sqrt{2\pi}} \text{exp} \left[ - \frac{(z)^{2}}{2} \right] " src="/images/math/e/3/f/e3f634ea319de07e04630c939d8e364f.png" />
This is the ''standard normal density function''.  We often denote this <math>\phi(\cdot)</math>, and say that "X is distributed as standard normal."  We can also get this by transforming ("standardizing") the normal variate <math>X</math>...
+
</p><p>Similarly, we often write the CDF for the standard normal as <img _fckfakelement="true" _fck_mw_math="\Phi(\cdot)" src="/images/math/8/9/d/89d767697c1931e19b576aef0e242f9b.png" />.
* If <math>X \sim N(\mu,\sigma^{2})</math>, then <math>Z  =  \frac{(x - \mu)}{\sigma} \sim N(0,1)</math>.  
+
</p>
*The density function then reduces to:
+
<h2> Why do we care about the normal distribution? </h2>
 
+
<p>The normal distribution's importance lies in its relationship to the central limit theorem.  As we'll discuss at more length later, the central limit theorem means that as one's sample size increases, the distribution of sample means (or other estimates) approaches a normal distribution.
 
+
</p>
<math> f(z) = \equiv \phi(z) = \frac{1}{\sqrt{2\pi}} \text{exp} \left[ - \frac{(z)^{2}}{2} \right] </math>
+
<h2>Additional points needed on the normal </h2>
 
+
<p><a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)
Similarly, we often write the CDF for the standard normal as <math>\Phi(\cdot)</math>.
+
</p>
 
+
<ul><li>More extended discussion of the CLT, and a note that if we are dealing with a data generating process where the "error" is the average (or cumulative) effect of a large number of random variables with a variety of distributions, the CLT tells us that the net effect will be normally distributed. This, in turn, explains why linear models that assume Normally distributed error---regression and ANOVA---have proven to be so robust in practice
== Why do we care about the normal distribution? ==
+
</li><li>Link to a number of examples of normally distributed data...should be easy to find these on the web. E.g. the classical height. Maybe SAT scores, though these are artificially normal
 
+
</li><li>ref to the wikipedia article; there is also a nice graphic to snag from there---introductory sidebar---which shows the standard normal
The normal distribution's importance lies in its relationship to the central limit theorem.  As we'll discuss at more length later, the central limit theorem means that as one's sample size increases, the distribution of sample means (or other estimates) approaches a normal distribution.
+
</li><li>sidebar on the log-normal?
 
+
</li><li>something about the bivariate normal and some nice graphics of this?
==Additional points needed on the normal ==
+
</li><li>sidebar on the issue of fat tails and how these destroyed the economy in 2007?---there is a fairly readable Wired article on this: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant
 
+
</li></ul>
[[User:Philip Schrodt|Philip Schrodt]] 07:00, 13 July 2011 (PDT)
+
<h1> The <span class="texhtml">&chi;<sup>2</sup></span> Distribution </h1>
 
+
<p>The chi-square (<span class="texhtml">&chi;<sup>2</sup></span>) distribution is a one-parameter distribution defined only how positive values. If <span class="texhtml"><i>Z</i>&tilde;<i>N</i>(0,1)</span>, then <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />.  That is, <i>the square</i> of a <span class="texhtml"><i>N</i>(0,1)</span> variable is chi-squared with one degree of freedom. The fact that the square of a standard normal variate is a one-degree-of-freedom chi-square variable also explains why (e.g.) a chi-squared variate is only defined for nonnegative real numbers.  If <span class="texhtml"><i>W</i><sub>1</sub>,<i>W</i><sub>2</sub>,...<i>W</i><sub><i>k</i></sub></span> are all independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variables, then <img _fckfakelement="true" _fck_mw_math="\sum_{i=1}^{k}W_{i} \sim \chi^{2}_{k}" src="/images/math/f/2/8/f2879b38c36a66d597e5669963650a44.png" />. (The sum of <span class="texhtml"><i>k</i></span> independent chi-squared variables is chi-squared with <span class="texhtml"><i>k</i></span> degrees of freedom).  By extension, the sum of the squares of <span class="texhtml"><i>k</i></span> independent <span class="texhtml"><i>N</i>(0,1)</span> variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />.
*More extended discussion of the CLT, and a note that if we are dealing with a data generating process where the "error" is the average (or cumulative) effect of a large number of random variables with a variety of distributions, the CLT tells us that the net effect will be normally distributed. This, in turn, explains why linear models that assume Normally distributed error---regression and ANOVA---have proven to be so robust in practice
+
</p><p>The <span class="texhtml">&chi;<sup>2</sup></span> distribution is positively skewed, with <span class="texhtml">E(<i>W</i>) = <i>k</i></span> and
*Link to a number of examples of normally distributed data...should be easy to find these on the web. E.g. the classical height. Maybe SAT scores, though these are artificially normal
+
<span class="texhtml">Var(<i>W</i>) = 2<i>k</i>.</span>
*ref to the wikipedia article; there is also a nice graphic to snag from there---introductory sidebar---which shows the standard normal
+
</p><p>Figure below presents five <span class="texhtml">&chi;<sup>2</sup></span> densities with different values of <span class="texhtml"><i>k</i></span>.
*sidebar on the log-normal?
+
</p><p><img src="/images/thumb/8/89/StatDist.ChiSquares.png/512px-StatDist.ChiSquares.png" _fck_mw_filename="StatDist.ChiSquares.png" _fck_mw_width="512" alt="StatDist.ChiSquares.png" />
*something about the bivariate normal and some nice graphics of this?
+
</p>
*sidebar on the issue of fat tails and how these destroyed the economy in 2007?---there is a fairly readable Wired article on this: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant
+
<pre class="_fck_mw_lspace">Need to define degrees of freedom here
 
+
</pre>
= The <math>\chi^{2}</math> Distribution =
+
<h3> Characteristics of the <span class="texhtml">&chi;<sup>2</sup></span> Distribution </h3>
 
+
<p>If <span class="texhtml"><i>W</i><sub><i>j</i></sub></span> and <span class="texhtml"><i>W</i><sub><i>k</i></sub></span> are independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{j}" src="/images/math/6/b/0/6b0005e43b70a25520c6a6abc4d0ea47.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variables, respectively, then <span class="texhtml"><i>W</i><sub><i>j</i></sub> + <i>W</i><sub><i>k</i></sub></span> is <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{j+k}" src="/images/math/1/0/c/10cddb6bd869d6187a32e742d90a0891.png" />; this result can be extended to any number of independent chi-squared variables. This in turn implies the result the sum of the squares of <span class="texhtml"><i>k</i></span> independent <span class="texhtml"><i>N</i>(0,1)</span> variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />
 
+
</p><p><br />
 
+
</p>
The chi-square (<math>\chi^{2}</math>) distribution is a one-parameter distribution defined only how positive values. If <math>Z \sim N(0,1)</math>, then <math>Z^{2} \sim \chi^{2}_{1}</math>. That is, ''the square'' of a <math>N(0,1)</math> variable is chi-squared with one degree of freedom. The fact that the square of a standard normal variate is a one-degree-of-freedom chi-square variable also explains why (e.g.) a chi-squared variate is only defined for nonnegative real numbers.  If <math>W_{1},W_{2},...W_{k}</math> are all independent <math>\chi^{2}_{1}</math> variables, then <math>\sum_{i=1}^{k}W_{i} \sim \chi^{2}_{k}</math>. (The sum of <math>k</math> independent chi-squared variables is chi-squared with <math>k</math> degrees of freedom).  By extension, the sum of the squares of <math>k</math> independent <math>N(0,1)</math> variables are also <math>\sim \chi^{2}_{k}</math>.  
+
<h2>Derivation of the <span class="texhtml">&chi;<sup>2</sup></span> from Gamma functions</h2>
 
+
<p>Gill discusses the <span class="texhtml">&chi;<sup>2</sup></span> distribution as a special case of the gamma PDF.  That's fine, but there's actually a much more intuitive way of thinking about it, and one that comports more closely with how it is (most commonly) used in statistics.  Formally, a variable <span class="texhtml"><i>W</i></span> that is distributed as <span class="texhtml">&chi;<sup>2</sup></span> with <span class="texhtml"><i>k</i></span> degrees of freedom has a density of:
The <math>\chi^{2}</math> distribution is positively skewed, with <math>\text{E}(W) = k</math> and
+
</p><p><img _fckfakelement="true" _fck_mw_math="\begin{align}&#10;f(w) &amp;=&amp; \frac{1}{2^{k} \Gamma(k)} w^{k} \text{exp} \left[ \frac{-w}{2} \right] \\&#10;              &amp;=&amp; \frac{w^{\frac{k-2}{2}} \exp(\frac{-w}{2})}{2^{\frac{k}{2}} \Gamma(\frac{k}{2})}&#10;\end{align} " src="/images/math/5/1/5/5158ca285caaeda5490da80182369578.png" />
<math>\text{Var}(W) = 2k.</math>
+
</p><p>where <img _fckfakelement="true" _fck_mw_math="\Gamma(k) = \int_{0}^{\infty} t^{k - 1} \text{exp}(-t) \, dt" src="/images/math/c/9/d/c9d5de0db64c93347442795b22a9f129.png" /> is the gamma integral (see, e.g., Gill, p.\ 222).  As with the normal distribution, the need to write the distribution in this fashion reflects the fact that it has no closed-form solution.  The corresponding CDF is
 
+
</p><p><img _fckfakelement="true" _fck_mw_math="&#10;F(w)=\frac{\gamma(k/2,w/2)}{\Gamma(k/2)}&#10;" src="/images/math/1/3/d/13d302967fd94bd45df3aa569e5503f2.png" />
Figure below presents five <math>\chi^{2}</math> densities with different values of <math>k</math>.
+
</p><p>where <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is as before and <img _fckfakelement="true" _fck_mw_math="\gamma(\cdot)" src="/images/math/e/4/8/e4812a607c060d3c6b4680f1b884021d.png" /> is the \texttt{http://en.wikipedia.org/wiki/Incomplete\_Gamma\_function}{lower incomplete gamma function}.  We write this\footnote{One also occasionally sees <span class="texhtml"><i>W</i>&tilde;&chi;<sup>2</sup>(<i>k</i>)</span>, with the degrees of freedom in parentheses.} as <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and say ``<span class="texhtml"><i>W</i></span> is distributed as chi-squared with <span class="texhtml"><i>k</i></span> degrees of freedom.<i> \\</i>
 
+
</p>
[[File:StatDist.ChiSquares.png | 512px]]
+
<h2>Additional points needed on the chi-square </h2>
 
+
<p><a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)
Need to define degrees of freedom here
+
</p>
 
+
<ul><li>Probably want to mention the use in contingency tables here, since the connection isn't obvious.
=== Characteristics of the <math>\chi^{2}</math> Distribution ===
+
</li><li>Agresti and Finlay state this was introduced by Pearson in 1900, apparently in the context of contingency tables---confirm this, any sort of story here?
 
+
</li><li>As df becomes very large, the chi-square approximates the normal; this is a asymptotic distribution and for practical purposes, can be used if df &gt; 50
If <math>W_{j}</math> and <math>W_{k}</math> are independent <math>\chi^{2}_{j}</math> and <math>\chi^{2}_{k}</math> variables, respectively, then <math>W_{j} + W_{k}</math> is <math>\sim \chi^{2}_{j+k}</math>; this result can be extended to any number of independent chi-squared variables. This in turn implies the result the sum of the squares of <math>k</math> independent <math>N(0,1)</math> variables are also <math>\sim \chi^{2}_{k}</math>
+
</li><li>Discuss more about the assumption of statistical independence?
 
+
</li><li>Chi-square as the test for comparing whether an observed frequency fits a known distribution
 
+
</li></ul>
==Derivation of the <math>\chi^{2}</math> from Gamma functions==
+
<h1> Student's <span class="texhtml"><i>t</i></span> Distribution </h1>
 
+
<p>For a variable <span class="texhtml"><i>X</i></span> which is distributed as <span class="texhtml"><i>t</i></span> with <span class="texhtml"><i>k</i></span> degrees of freedom, the PDF function is:
Gill discusses the <math>\chi^{2}</math> distribution as a special case of the gamma PDF. That's fine, but there's actually a much more intuitive way of thinking about it, and one that comports more closely with how it is (most commonly) used in statistics. Formally, a variable <math>W</math> that is distributed as <math>\chi^{2}</math> with <math>k</math> degrees of freedom has a density of:
+
</p><p><img _fckfakelement="true" _fck_mw_math="&#10;f(x) = \frac{\Gamma(\frac{k+1}{2})} {\sqrt{k\pi}\,\Gamma(\frac{k}{2})} \left(1+\frac{x^2}{k} \right)^{-(\frac{k+1}{2})}\!&#10;" src="/images/math/e/6/d/e6d0efa21a2ef9e5400f5d5cfdac879f.png" />
 
+
</p><p>where once again <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is the gamma integral.  We write <span class="texhtml"><i>X</i>&tilde;<i>t</i><sub><i>k</i></sub></span>, and say ``<span class="texhtml"><i>X</i></span> is distributed as Student's <span class="texhtml"><i>t</i></span> with <span class="texhtml"><i>k</i></span> degrees of freedom.<i> The figure below presents <span class="texhtml"><i>t</i></span> densities for five different values of <span class="texhtml"><i>k</i></span>, along with a standard normal density for comparison.</i>
<math>\begin{align}
+
</p><p><br />
f(w) &=& \frac{1}{2^{k} \Gamma(k)} w^{k} \text{exp} \left[ \frac{-w}{2} \right] \\
+
<img src="/images/thumb/d/d8/StatDist.tDists.png/512px-StatDist.tDists.png" _fck_mw_filename="StatDist.tDists.png" _fck_mw_width="512" alt="StatDist.tDists.png" />
              &=& \frac{w^{\frac{k-2}{2}} \exp(\frac{-w}{2})}{2^{\frac{k}{2}} \Gamma(\frac{k}{2})}
+
</p><p>The t-distribution is sometimes known as "Student's t", after a then-anonymous ``student<i> of the statistician Karl Pearson. The story, from Wikipedia,</i>
\end{align} </math>
+
</p>
 
 
where <math>\Gamma(k) = \int_{0}^{\infty} t^{k - 1} \text{exp}(-t) \, dt</math> is the gamma integral (see, e.g., Gill, p.\ 222).  As with the normal distribution, the need to write the distribution in this fashion reflects the fact that it has no closed-form solution.  The corresponding CDF is
 
 
 
<math>
 
F(w)=\frac{\gamma(k/2,w/2)}{\Gamma(k/2)}
 
</math>
 
 
 
where <math>\Gamma(\cdot)</math> is as before and <math>\gamma(\cdot)</math> is the \texttt{http://en.wikipedia.org/wiki/Incomplete\_Gamma\_function}{lower incomplete gamma function}. We write this\footnote{One also occasionally sees <math>W \sim \chi^{2}(k)</math>, with the degrees of freedom in parentheses.} as <math>W \sim \chi^{2}_{k}</math>, and say ``<math>W</math> is distributed as chi-squared with <math>k</math> degrees of freedom.'' \\
 
 
 
==Additional points needed on the chi-square ==
 
 
 
[[User:Philip Schrodt|Philip Schrodt]] 07:00, 13 July 2011 (PDT)
 
 
 
*Probably want to mention the use in contingency tables here, since the connection isn't obvious.
 
*Agresti and Finlay state this was introduced by Pearson in 1900, apparently in the context of contingency tables---confirm this, any sort of story here?
 
*As df becomes very large, the chi-square approximates the normal; this is a asymptotic distribution and for practical purposes, can be used if df > 50
 
*Discuss more about the assumption of statistical independence?
 
*Chi-square as the test for comparing whether an observed frequency fits a known distribution
 
 
 
= Student's <math>t</math> Distribution =
 
 
 
For a variable <math>X</math> which is distributed as <math>t</math> with <math>k</math> degrees of freedom, the PDF function is:
 
 
 
<math>
 
f(x) = \frac{\Gamma(\frac{k+1}{2})} {\sqrt{k\pi}\,\Gamma(\frac{k}{2})} \left(1+\frac{x^2}{k} \right)^{-(\frac{k+1}{2})}\!
 
</math>
 
 
 
where once again <math>\Gamma(\cdot)</math> is the gamma integral.  We write <math>X \sim t_{k}</math>, and say ``<math>X</math> is distributed as Student's <math>t</math> with <math>k</math> degrees of freedom.'' The figure below presents <math>t</math> densities for five different values of <math>k</math>, along with a standard normal density for comparison.
 
 
 
 
 
[[File:StatDist.tDists.png | 512px]]
 
 
 
The t-distribution is sometimes known as "Student's t", after a then-anonymous ``student'' of the statistician Karl Pearson. The story, from Wikipedia,
 
 
 
 
<blockquote>
 
<blockquote>
 
  The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name).  Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika  in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians.
 
  The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name).  Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika  in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians.
Line 201: Line 140:
 
<!-- http://en.wikipedia.org/wiki/Student%27s_t-test -->
 
<!-- http://en.wikipedia.org/wiki/Student%27s_t-test -->
  
Note a few things about <math>t</math>:
+
<p>Note a few things about <span class="texhtml"><i>t</i></span>:
 
+
</p>
* The mean/mode/median of a <math>t</math>-distributed variate is zero, and its variance is <math>\frac{k}{k - 2}</math>.  
+
<ul><li> The mean/mode/median of a <span class="texhtml"><i>t</i></span>-distributed variate is zero, and its variance is <img _fckfakelement="true" _fck_mw_math="\frac{k}{k - 2}" src="/images/math/d/7/5/d75c6ef4ee2b360ba8f65eb687e33f1e.png" />.  
* <math>t</math> looks like a standard normal distribution (symmetrical, bell-shaped) but has thicker ``tails'' (read: higher probabilities of draws being relatively far from the mean/mode).  However...
+
</li><li> <span class="texhtml"><i>t</i></span> looks like a standard normal distribution (symmetrical, bell-shaped) but has thicker ``tails<i> (read: higher probabilities of draws being relatively far from the mean/mode).  However...</i>
* ...as <math>k</math> gets larger, <math>t</math> converges to a standard normal distribution; at or above <math>k = 30</math> or so, the two are effectively indistinguishable.
+
</li><li> ...as <span class="texhtml"><i>k</i></span> gets larger, <span class="texhtml"><i>t</i></span> converges to a standard normal distribution; at or above <span class="texhtml"><i>k</i> = 30</span> or so, the two are effectively indistinguishable.
 
+
</li></ul>
The importance of the <math>t</math> distribution lies in its relationship to the normal and chi-square distributions.  In particular, if <math>Z \sim N(0,1)</math> and <math>W \sim \chi^{2}_{k}</math>, and <math>Z</math> and <math>W</math> are independent, then  
+
<p>The importance of the <span class="texhtml"><i>t</i></span> distribution lies in its relationship to the normal and chi-square distributions.  In particular, if <span class="texhtml"><i>Z</i>&tilde;<i>N</i>(0,1)</span> and <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and <span class="texhtml"><i>Z</i></span> and <span class="texhtml"><i>W</i></span> are independent, then  
 
+
</p><p><img _fckfakelement="true" _fck_mw_math="\frac{Z}{\sqrt{W/k}} \sim t_{k} " src="/images/math/a/a/2/aa26d30be9d0c50902042558dcf5f532.png" />
<math>\frac{Z}{\sqrt{W/k}} \sim t_{k} </math>
+
</p><p>That is, the ratio of an <span class="texhtml"><i>N</i>(0,1)</span> variable and a (properly transformed) chi-squared variable follows a <span class="texhtml"><i>t</i></span> distribution, with d.f.\ equal to the number of d.f.\ of the chi-squared variable.  Of course, this also means that <img _fckfakelement="true" _fck_mw_math="\frac{Z^{2}}{W/k} \sim t_{k}." src="/images/math/6/c/c/6cc32b16a52a7fcdaf5f98e77177b6b2.png" />
 
+
</p><p>Since we know that <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />, this means that another derivation of the <span class="texhtml"><i>t</i></span> distribution is as a ratio of a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variate and a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variate.  
That is, the ratio of an <math>N(0,1)</math> variable and a (properly transformed) chi-squared variable follows a <math>t</math> distribution, with d.f.\ equal to the number of d.f.\ of the chi-squared variable.  Of course, this also means that <math>\frac{Z^{2}}{W/k} \sim t_{k}.</math>
+
</p><p><br />
 
+
</p>
Since we know that <math>Z^{2} \sim \chi^{2}_{1}</math>, this means that another derivation of the <math>t</math> distribution is as a ratio of a <math>\chi^{2}_{1}</math> variate and a <math>\chi^{2}_{k}</math> variate.  
+
<h2>Additional points needed on the t distribution </h2>
 
+
<p><a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)
 
+
</p>
==Additional points needed on the t distribution ==
+
<ul><li>May want to note that it is ubiquitous in the inference on regression coefficients
 
+
</li><li>Might want to note somewhere---this might go earlier in the discussion of df---that in most social science research (e.g. survey research and time-series cross-sections), the sample sizes are well above the point where the t is asymtotically normal. The t is actually important only in very small samples, though these can be found in situations such as small subsamples in survey research (are Hispanic ferret owners in Wyoming more likely to support the Tea Party?) and situations where the population itself is small (e.g. state membership in the EU, Latin America, or ECOWAS), and experiments with a small number of subjects or cases (this is commonly found in medical research, for example, and this also motivated Gossett's original development of the test, albeit with yeast and hops---we presume---rather than experimental subjects.). In these instances, using the conventional normal approximation to the t---in particular, the rule-of-thumb of looking for standard errors less than twice the size of the coefficient estimate to establish two-tailed 0.05 significance---will be misleading.
[[User:Philip Schrodt|Philip Schrodt]] 07:00, 13 July 2011 (PDT)
+
</li></ul>
 
+
<h1> The <span class="texhtml"><i>F</i></span> Distribution </h1>
*May want to note that it is ubiquitous in the inference on regression coefficients
+
<p>An <span class="texhtml"><i>F</i></span> distribution is the ratio of two chi-squared variates. If <span class="texhtml"><i>W</i><sub>1</sub></span> and <span class="texhtml"><i>W</i><sub>2</sub></span> are independent and <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{\ell}" src="/images/math/3/f/c/3fc30391893e01b4c49b1a8ec41b574c.png" />, respectively,  then
*Might want to note somewhere---this might go earlier in the discussion of df---that in most social science research (e.g. survey research and time-series cross-sections), the sample sizes are well above the point where the t is asymtotically normal. The t is actually important only in very small samples, though these can be found in situations such as small subsamples in survey research (are Hispanic ferret owners in Wyoming more likely to support the Tea Party?) and situations where the population itself is small (e.g. state membership in the EU, Latin America, or ECOWAS), and experiments with a small number of subjects or cases (this is commonly found in medical research, for example, and this also motivated Gossett's original development of the test, albeit with yeast and hops---we presume---rather than experimental subjects.). In these instances, using the conventional normal approximation to the t---in particular, the rule-of-thumb of looking for standard errors less than twice the size of the coefficient estimate to establish two-tailed 0.05 significance---will be misleading.
+
<img _fckfakelement="true" _fck_mw_math="\frac{W_{1}}{W_{2}} \sim F_{k,\ell}&#10;" src="/images/math/0/5/1/05164dba515bcc3a894d2ce3731268cc.png" />
 
+
</p><p>That is, the ratio of two chi-squared variables is distributed as <span class="texhtml"><i>F</i></span> with d.f.\ equal to the number of d.f.\ in the numerator and denominator variables, respectively.
= The <math>F</math> Distribution =
+
</p><p>Formally, if <span class="texhtml"><i>X</i></span> is distributed as <span class="texhtml"><i>F</i></span> with <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom, then the PDF of <span class="texhtml"><i>X</i></span> is:
 
+
</p><p><img _fckfakelement="true" _fck_mw_math="&#10;f(x) = \frac{\left(\frac{k\,x}{k\,x + \ell}\right)^{k/2} \left(1-\frac{k\,x}{k\,x + \ell}\right)^{\ell/2}}{x\; \mathrm{B}(k/2, \ell/2)} &#10;" src="/images/math/7/6/8/76893d623336cf8f6974dd4a40735ec7.png" />
An <math>F</math> distribution is the ratio of two chi-squared variates. If <math>W_{1}</math> and <math>W_{2}</math> are independent and <math>\sim \chi^{2}_{k}</math> and <math>\chi^{2}_{\ell}</math>, respectively,  then
+
</p><p><br />
<math>\frac{W_{1}}{W_{2}} \sim F_{k,\ell}
+
where <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(\cdot)" src="/images/math/c/2/d/c2d3433e3640c11e1f072c4006e17c11.png" /> is the ``beta function. That is, <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(x,y) = \int_0^1t^{x-1}(1-t)^{y-1}\,dt" src="/images/math/0/6/8/0689adb68c7ec29099fc40c34aa0dad5.png" />.}  We write <img _fckfakelement="true" _fck_mw_math="X \sim F_{k,\ell}" src="/images/math/a/2/c/a2c04e5e0527e6a1156e6bbc58d89c7f.png" />, and say ``<span class="texhtml"><i>X</i></span> is distributed as <span class="texhtml"><i>F</i></span> with <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom.<i> \\}</i>
</math>
+
</p><p>The <span class="texhtml"><i>F</i></span> is a two-parameter distribution, with degrees of freedom parameters (say <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />), both of which are limited to the positive integersAn <span class="texhtml"><i>F</i></span> variate <span class="texhtml"><i>X</i></span> takes values only on the non-negative real line; it has expected value equal to <img _fckfakelement="true" _fck_mw_math="\text{E}(X) = \frac{\ell}{\ell - 2}," src="/images/math/8/d/a/8dab1cce0da2d33b88c188a1cab3c153.png" /> which implies that the mean of an <span class="texhtml"><i>F</i></span>-distributed variable converges on 1.0 as <img _fckfakelement="true" _fck_mw_math="\ell \rightarrow \infty" src="/images/math/d/1/3/d132d0a78b8c0819b6187998c23cd1fb.png" />.  Likewise, it has variance
 
+
<img _fckfakelement="true" _fck_mw_math="\text{Var}(X) = \frac{2\,\ell^2\,(k+\ell-2)}{k (\ell-2)^2 (\ell-4)}, " src="/images/math/3/5/1/35190b821a12585a7f4b779386f719ac.png" /> which bears no simple relationship to either <span class="texhtml"><i>k</i></span> or <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />
That is, the ratio of two chi-squared variables is distributed as <math>F</math> with d.f.\ equal to the number of d.f.\ in the numerator and denominator variables, respectively.  
+
</p><p>The <span class="texhtml"><i>F</i></span> distribution is (generally) positively skewed.  Examples of some <span class="texhtml"><i>F</i></span> densities with different values of <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> are presented in the figure below.
 
+
</p><p><br />
Formally, if <math>X</math> is distributed as <math>F</math> with <math>k</math> and <math>\ell</math> degrees of freedom, then the PDF of <math>X</math> is:
+
<img src="/images/thumb/4/46/StatDist.FDists.png/512px-StatDist.FDists.png" _fck_mw_filename="StatDist.FDists.png" _fck_mw_width="512" alt="StatDist.FDists.png" />
 
+
</p><p>If <img _fckfakelement="true" _fck_mw_math="X \sim F(k, \ell)" src="/images/math/3/0/a/30a268520cf8c7ecc505b7280344f1b2.png" />, then <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} \sim F(\ell, k)" src="/images/math/c/2/b/c2b8fcb83c04471d8606d0cb75b97fbe.png" /> (because <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} = \frac{1}{(W_{1} / W_{2})} = \frac{W_{2}}{W_{1}}" src="/images/math/0/c/b/0cbf35f2c3c12afa441855bf91a2185d.png" />). In addition, the square of a <span class="texhtml"><i>t</i></span> distributed variable is <span class="texhtml">&tilde;<i>F</i>(1,<i>k</i>)</span> (\textit{why}? -- take the formula for <span class="texhtml"><i>t</i></span>, and square it...)
<math>
+
</p>
f(x) = \frac{\left(\frac{k\,x}{k\,x + \ell}\right)^{k/2} \left(1-\frac{k\,x}{k\,x + \ell}\right)^{\ell/2}}{x\; \mathrm{B}(k/2, \ell/2)}
+
<h2>Additional points needed on the F distribution </h2>
</math>
+
<p><a href="User:Philip Schrodt">Philip Schrodt</a> 10:00, 13 July 2011 (PDT)
 
+
</p>
 
+
<ul><li>Discovered by Fisher in 1922, hence "F"
where <math>\mathrm{B}(\cdot)</math> is the ``beta function. That is, <math>\mathrm{B}(x,y) = \int_0^1t^{x-1}(1-t)^{y-1}\,dt</math>.}  We write <math>X \sim F_{k,\ell}</math>, and say ``<math>X</math> is distributed as <math>F</math> with <math>k</math> and <math>\ell</math> degrees of freedom.''  \\}
+
</li><li>Mention how it will be used for <span class="texhtml"><i>R</i><sup>2</sup></span> and ANOVA <strong class='error'>Failed to parse (syntax error): F = MS_\frac{{between},MS_{within}}</strong>
 
+
</li></ul>
The <math>F</math> is a two-parameter distribution, with degrees of freedom parameters (say <math>k</math> and <math>\ell</math>), both of which are limited to the positive integers.  An <math>F</math> variate <math>X</math> takes values only on the non-negative real line; it has expected value equal to <math>\text{E}(X) = \frac{\ell}{\ell - 2},</math> which implies that the mean of an <math>F</math>-distributed variable converges on 1.0 as <math>\ell \rightarrow \infty</math>.  Likewise, it has variance
+
<ul><li>Square of a <span class="texhtml"><i>t</i><sub><i>k</i></sub></span> statistic is an <span class="texhtml"><i>F</i><sub>1,<i>k</i></sub></span> statistic
<math>\text{Var}(X) = \frac{2\,\ell^2\,(k+\ell-2)}{k (\ell-2)^2 (\ell-4)}, </math> which bears no simple relationship to either <math>k</math> or <math>\ell</math>.
+
</li></ul>
 
+
<h1> Summary: Relationships Among Continuous Distributions </h1>
The <math>F</math> distribution is (generally) positively skewedExamples of some <math>F</math> densities with different values of <math>k</math> and <math>\ell</math> are presented in the figure below.
+
<p>The substantive importance of all these distributions will become apparent as we move on to sampling distributions and statistical inference. In the meantime, it is useful to consider the relationship between the four distributions we discussion above
 
+
</p><p><img src="/images/thumb/2/2e/Continuous.dists.png/512px-Continuous.dists.png" _fck_mw_filename="Continuous.dists.png" _fck_mw_width="512" alt="Continuous.dists.png" />
 
+
</p><p>
[[File:StatDist.FDists.png | 512px]]
 
 
 
If <math>X \sim F(k, \ell)</math>, then <math>\frac{1}{X} \sim F(\ell, k)</math> (because <math>\frac{1}{X} = \frac{1}{(W_{1} / W_{2})} = \frac{W_{2}}{W_{1}}</math>). In addition, the square of a <math>t</math> distributed variable is <math>\sim F(1,k)</math> (\textit{why}? -- take the formula for <math>t</math>, and square it...)
 
 
 
==Additional points needed on the F distribution ==
 
 
 
[[User:Philip Schrodt|Philip Schrodt]] 10:00, 13 July 2011 (PDT)
 
 
 
*Discovered by Fisher in 1922, hence "F"
 
*Mention how it will be used for <math>R^2</math> and ANOVA <math>F = MS_\frac{{between},MS_{within}}</math>  
 
*Square of a <math>t_k</math> statistic is an <math>F_{1,k}</math> statistic
 
 
 
= Summary: Relationships Among Continuous Distributions =
 
 
 
The substantive importance of all these distributions will become apparent as we move on to sampling distributions and statistical inference. In the meantime, it is useful to consider the relationship between the four distributions we discussion above
 
 
 
[[File:Continuous.dists.png | 512px]]
 
 
 
 
 
 
<!--DO NOT EDIT THE REFERENCE SECTION-->
 
<!--DO NOT EDIT THE REFERENCE SECTION-->
=References=
 
{{Reflist}}
 
 
=Discussion questions=
 
#
 
#
 
#
 
#
 
#
 
  
=Problems=
+
</p>
#
+
<h1>References</h1>
#
+
<p><span class="fck_mw_template">{{Reflist}}</span>
#
+
</p>
#
+
<h1>Discussion questions</h1>
#
+
<ol><li>
 
+
</li><li>
=Glossary=
+
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li></ol>
 +
<h1>Problems</h1>
 +
<ol><li>
 +
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li><li>
 +
</li></ol>
 +
<p>=Glossary=
 
<!-- Here add any keywords or terms introduced on this page. Add them in a list like:
 
<!-- Here add any keywords or terms introduced on this page. Add them in a list like:
 
:*[[Def:newterm1]]
 
:*[[Def:newterm1]]
Line 289: Line 211:
 
:*[[Def:newterm3]]
 
:*[[Def:newterm3]]
 
Do not edit above this line.-->
 
Do not edit above this line.-->
:*[[Def: ]]
 
:*[[Def: ]]
 
:*[[Def: ]]
 
  
 +
</p>
 +
<dl><dd><ul><li>[[Def: ]]
 +
</li><li>[[Def: ]]
 +
</li><li>[[Def: ]]
 +
</li></ul>
 +
</dd></dl>
 +
<p>
 
<!--Do not edit below this line.-->
 
<!--Do not edit below this line.-->
__FORCETOC__
+
 
 +
</p>
 +
<pre class="_fck_mw_lspace">__FORCETOC__
 +
</pre>

Revision as of 09:01, 13 July 2011


Objectives

Introduction

In the previous chapter we discussed probability theory, which we expressed in terms of a variable $X$. We defined $X$ as a set of realizations of some process, which in turn is governed by rules of probability regarding potential outcomes in the sample space.

The variables we were talking about have been what are called random variables, which means that they have a probability distribution. As we noted before, broadly speaking, there are two kinds of random variables: discrete and continuous.

Discrete variables can take on any one of several distinct, mutually-exclusive values.

  • Congressperson's ideology score {0,1,2,3...,100}
  • An individual's political affiliation (Democrat, Republican, Independent}
  • Whether or not a country is a member of European Union (true/false)

A Continuous variable can take on any value in its range.

  • Individual income
  • National population

This chapter focuses on a family of continuous distributions that are the most widely used in statistical inference, and are found in a wide variety of contexts, both applied and theoretical. The Normal distribution is the well-known "bell-shaped curve" that most students usually encounter first in the artificial context of academic testing, but due to a powerful result called the Central Limit Theorem, occurs in a wide variety of uncontrolled situations where the value of a random variables is determined by the average effect of a large number of random variables with any combination of distributions. The χ2, t and F distributions can be derived from various products of normally-distributed variables, and are used extensively in statistical inference and applied statistics, so it's useful to understand them in a bit of depth.

Need to do

<a href="User:Philip Schrodt">Philip Schrodt</a> 06:57, 13 July 2011 (PDT)

  • Probably need to get most of the probability chapter---which are the moment hasn't been started---written before this one. In particular, will the pdf and cdf be defined there or here?
  • Add some of the discrete distributions, particularly the binomial
  • Add the uniform?
  • Do we add---or link to on another page---the derivation of the mean and standard errors for these: that code is available in CCL on an assortment of places on the web

The Normal Distribution

We are all used to seeing normal distributions described, and to hearing that something is "normally distributed." We know that a normal distribution is "bell-shaped," and symmetrical, and probably that it has some mean and some standard deviation.

Formally, if X is a normally distributed variate with mean μ and variance σ2, then:

<img _fckfakelement="true" _fck_mw_math="f(x) = \frac{1}{\sigma \sqrt{2\pi}} \text{exp} \left( - \frac{(x - \mu)^{2}}{2 \sigma^{2}} \right)" src="/images/math/0/5/c/05c01fdab44e6d59e0edc24028e1206a.png" />.


We denote this X˜N(μ,σ2), and say ``X is distributed normally with mean mu and variance sigma squared. The symbol φ is often used as a shorthand to represent the normal density in \eqref{normalden}:

<img _fckfakelement="true" _fck_mw_math="X \sim \phi_{\mu, \sigma^{2}}" src="/images/math/d/6/4/d64347ee6feb5ed546c6c65e3674dfb5.png" />.

The corresponding normal CDF -- which is the probability of a normal random variate taking on a value less than or equal to some specified number -- is (as always) the indefinite integral of \eqref{normalden}. This has no simple closed-form solution, so we typically just write:

<img _fckfakelement="true" _fck_mw_math="F(x) \equiv \Phi_{\mu, \sigma^{2}}(x) = \int \phi_{\mu, \sigma^{2}} f(x) d x." src="/images/math/4/b/0/4b02e1a28b9bbc37b9bec48dcc04b239.png" />

Here are a bunch of normal curves

<img src="/images/thumb/5/55/StatDist.Normals.png/512px-StatDist.Normals.png" _fck_mw_filename="StatDist.Normals.png" _fck_mw_width="512" alt="StatDist.Normals.png" />


<img src="/images/thumb/b/b7/StatDist.NormalCDFs.png/512px-StatDist.NormalCDFs.png" _fck_mw_filename="StatDist.NormalCDFs.png" _fck_mw_width="512" alt="Normal cumulative distribution functions" />

Bases for the Normal Distribution

The most common justification for the normal distribution has its roots in the 'central limit theorem'. Consider i = 1,2,...N independent, real-valued random variates $Xi$, each with finite mean $μi$ and variance <img _fckfakelement="true" _fck_mw_math="\sigma^{2}_{i} > 0" src="/images/math/8/e/3/8e3e18b02caaa63b535d363869d670c9.png" />. If we consider a new variable $X$ defined as the sum of these variables:

<img _fckfakelement="true" _fck_mw_math="X = \sum_{i=1}^{N} X_{i}" src="/images/math/a/7/5/a752c37c7aaf9b5d42055a00f9b5fd37.png" />

then we know that

<img _fckfakelement="true" _fck_mw_math=" \text{E}(X) = \sum_{i=1}^{N} \mu_{i} " src="/images/math/d/a/7/da7eb476d715554622bbefea75301103.png" />

and

<img _fckfakelement="true" _fck_mw_math=" \text{Var}(X) = \sum_{i=1}^{N} \sigma^{2}_{i} " src="/images/math/7/7/d/77de1add6aafc0249d728571a9683b18.png" />


The central limit theorem states that:

<img _fckfakelement="true" _fck_mw_math=" \underset{N \rightarrow \infty}{\lim} X = \underset{N \rightarrow \infty}{\lim} \sum_{i=1}^{N} X_{i} \overset{D}{\rightarrow} N(\cdot) " src="/images/math/8/a/1/8a127d6ed5c5e2dd71ebdf34d8682057.png" />

where the notation <img _fckfakelement="true" _fck_mw_math="\overset{D}{\rightarrow}" src="/images/math/0/9/3/0931fec8f6726354023e382d5c71be2c.png" /> indicates convergence in distribution. That is, as N gets sufficiently large, the distribution of the sum of N independent random variates with finite mean and variance will converge to a normal distribution. As such, we often think of a normal distribution as being appropriate when the observed variable X can take on a range of continuous values, and when the observed value of X can be thought of as the product of a large number of relatively small, independent ``shocks or perturbations.

Properties of the Normal Distribution

  • A normal variate X has support in <img _fckfakelement="true" _fck_mw_math="\mathfrak{R}" src="/images/math/6/1/0/610bc52ec5a62efd154a01deb92a0d5c.png" />.
  • The normal is a two-parameter distribution, where <img _fckfakelement="true" _fck_mw_math="\mu \in (-\infty, \infty)" src="/images/math/8/9/0/8907d5e8f9cb4328f76778e16b69fba7.png" /> and <img _fckfakelement="true" _fck_mw_math="\sigma^{2} \in (0, \infty)" src="/images/math/c/1/0/c100ff23356ea61cfd105a64d8774c53.png" />.
  • The normal distribution is always symmetrical (M3 = 0) and mesokurtic.
  • item The normal distribution is preserved under a linear transformation. That is, if X˜N(μ,σ2), then aX + b˜N(aμ + b,a2σ2). (Why? Recall our earlier results on μ and σ2).


The Standard Normal Distribution

One linear transformation is especially useful:

<img _fckfakelement="true" _fck_mw_math=" \begin{align} b & = \frac{-\mu}{\sigma} \\ a & = \frac{1}{\sigma} \end{align} " src="/images/math/8/e/e/8ee8272b17591d286ca783dd0a7b5dd0.png" />.


This yields:

<img _fckfakelement="true" _fck_mw_math=" \begin{align} ax + b & \sim N(a\mu+b, a^{2} \sigma^{2}) \\ & \sim N(0,1) \end{align} " src="/images/math/0/4/0/040ae3bbba3a7ab522dca56833ad2722.png" />

This is the standard normal density function. We often denote this <img _fckfakelement="true" _fck_mw_math="\phi(\cdot)" src="/images/math/5/2/d/52d16e95602c985d5f23b36ddc663415.png" />, and say that "X is distributed as standard normal." We can also get this by transforming ("standardizing") the normal variate X...

  • If X˜N(μ,σ2), then <img _fckfakelement="true" _fck_mw_math="Z = \frac{(x - \mu)}{\sigma} \sim N(0,1)" src="/images/math/8/1/e/81e060afad46dc085b84bc31bf94454c.png" />.
  • The density function then reduces to:


<img _fckfakelement="true" _fck_mw_math=" f(z) = \equiv \phi(z) = \frac{1}{\sqrt{2\pi}} \text{exp} \left[ - \frac{(z)^{2}}{2} \right] " src="/images/math/e/3/f/e3f634ea319de07e04630c939d8e364f.png" />

Similarly, we often write the CDF for the standard normal as <img _fckfakelement="true" _fck_mw_math="\Phi(\cdot)" src="/images/math/8/9/d/89d767697c1931e19b576aef0e242f9b.png" />.

Why do we care about the normal distribution?

The normal distribution's importance lies in its relationship to the central limit theorem. As we'll discuss at more length later, the central limit theorem means that as one's sample size increases, the distribution of sample means (or other estimates) approaches a normal distribution.

Additional points needed on the normal

<a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)

  • More extended discussion of the CLT, and a note that if we are dealing with a data generating process where the "error" is the average (or cumulative) effect of a large number of random variables with a variety of distributions, the CLT tells us that the net effect will be normally distributed. This, in turn, explains why linear models that assume Normally distributed error---regression and ANOVA---have proven to be so robust in practice
  • Link to a number of examples of normally distributed data...should be easy to find these on the web. E.g. the classical height. Maybe SAT scores, though these are artificially normal
  • ref to the wikipedia article; there is also a nice graphic to snag from there---introductory sidebar---which shows the standard normal
  • sidebar on the log-normal?
  • something about the bivariate normal and some nice graphics of this?
  • sidebar on the issue of fat tails and how these destroyed the economy in 2007?---there is a fairly readable Wired article on this: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant

The χ2 Distribution

The chi-square (χ2) distribution is a one-parameter distribution defined only how positive values. If Z˜N(0,1), then <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />. That is, the square of a N(0,1) variable is chi-squared with one degree of freedom. The fact that the square of a standard normal variate is a one-degree-of-freedom chi-square variable also explains why (e.g.) a chi-squared variate is only defined for nonnegative real numbers. If W1,W2,...Wk are all independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variables, then <img _fckfakelement="true" _fck_mw_math="\sum_{i=1}^{k}W_{i} \sim \chi^{2}_{k}" src="/images/math/f/2/8/f2879b38c36a66d597e5669963650a44.png" />. (The sum of k independent chi-squared variables is chi-squared with k degrees of freedom). By extension, the sum of the squares of k independent N(0,1) variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />.

The χ2 distribution is positively skewed, with E(W) = k and

Var(W) = 2k.

Figure below presents five χ2 densities with different values of k.

<img src="/images/thumb/8/89/StatDist.ChiSquares.png/512px-StatDist.ChiSquares.png" _fck_mw_filename="StatDist.ChiSquares.png" _fck_mw_width="512" alt="StatDist.ChiSquares.png" />

Need to define degrees of freedom here

Characteristics of the χ2 Distribution

If Wj and Wk are independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{j}" src="/images/math/6/b/0/6b0005e43b70a25520c6a6abc4d0ea47.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variables, respectively, then Wj + Wk is <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{j+k}" src="/images/math/1/0/c/10cddb6bd869d6187a32e742d90a0891.png" />; this result can be extended to any number of independent chi-squared variables. This in turn implies the result the sum of the squares of k independent N(0,1) variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />


Derivation of the χ2 from Gamma functions

Gill discusses the χ2 distribution as a special case of the gamma PDF. That's fine, but there's actually a much more intuitive way of thinking about it, and one that comports more closely with how it is (most commonly) used in statistics. Formally, a variable W that is distributed as χ2 with k degrees of freedom has a density of:

<img _fckfakelement="true" _fck_mw_math="\begin{align} f(w) &=& \frac{1}{2^{k} \Gamma(k)} w^{k} \text{exp} \left[ \frac{-w}{2} \right] \\ &=& \frac{w^{\frac{k-2}{2}} \exp(\frac{-w}{2})}{2^{\frac{k}{2}} \Gamma(\frac{k}{2})} \end{align} " src="/images/math/5/1/5/5158ca285caaeda5490da80182369578.png" />

where <img _fckfakelement="true" _fck_mw_math="\Gamma(k) = \int_{0}^{\infty} t^{k - 1} \text{exp}(-t) \, dt" src="/images/math/c/9/d/c9d5de0db64c93347442795b22a9f129.png" /> is the gamma integral (see, e.g., Gill, p.\ 222). As with the normal distribution, the need to write the distribution in this fashion reflects the fact that it has no closed-form solution. The corresponding CDF is

<img _fckfakelement="true" _fck_mw_math=" F(w)=\frac{\gamma(k/2,w/2)}{\Gamma(k/2)} " src="/images/math/1/3/d/13d302967fd94bd45df3aa569e5503f2.png" />

where <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is as before and <img _fckfakelement="true" _fck_mw_math="\gamma(\cdot)" src="/images/math/e/4/8/e4812a607c060d3c6b4680f1b884021d.png" /> is the \texttt{http://en.wikipedia.org/wiki/Incomplete\_Gamma\_function}{lower incomplete gamma function}. We write this\footnote{One also occasionally sees W˜χ2(k), with the degrees of freedom in parentheses.} as <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and say ``W is distributed as chi-squared with k degrees of freedom. \\

Additional points needed on the chi-square

<a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)

  • Probably want to mention the use in contingency tables here, since the connection isn't obvious.
  • Agresti and Finlay state this was introduced by Pearson in 1900, apparently in the context of contingency tables---confirm this, any sort of story here?
  • As df becomes very large, the chi-square approximates the normal; this is a asymptotic distribution and for practical purposes, can be used if df > 50
  • Discuss more about the assumption of statistical independence?
  • Chi-square as the test for comparing whether an observed frequency fits a known distribution

Student's t Distribution

For a variable X which is distributed as t with k degrees of freedom, the PDF function is:

<img _fckfakelement="true" _fck_mw_math=" f(x) = \frac{\Gamma(\frac{k+1}{2})} {\sqrt{k\pi}\,\Gamma(\frac{k}{2})} \left(1+\frac{x^2}{k} \right)^{-(\frac{k+1}{2})}\! " src="/images/math/e/6/d/e6d0efa21a2ef9e5400f5d5cfdac879f.png" />

where once again <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is the gamma integral. We write X˜tk, and say ``X is distributed as Student's t with k degrees of freedom. The figure below presents <i>t</i> densities for five different values of <i>k</i>, along with a standard normal density for comparison.


<img src="/images/thumb/d/d8/StatDist.tDists.png/512px-StatDist.tDists.png" _fck_mw_filename="StatDist.tDists.png" _fck_mw_width="512" alt="StatDist.tDists.png" />

The t-distribution is sometimes known as "Student's t", after a then-anonymous ``student of the statistician Karl Pearson. The story, from Wikipedia,

The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name). Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians.

Note a few things about t:

  • The mean/mode/median of a t-distributed variate is zero, and its variance is <img _fckfakelement="true" _fck_mw_math="\frac{k}{k - 2}" src="/images/math/d/7/5/d75c6ef4ee2b360ba8f65eb687e33f1e.png" />.
  • t looks like a standard normal distribution (symmetrical, bell-shaped) but has thicker ``tails (read: higher probabilities of draws being relatively far from the mean/mode). However...
  • ...as k gets larger, t converges to a standard normal distribution; at or above k = 30 or so, the two are effectively indistinguishable.

The importance of the t distribution lies in its relationship to the normal and chi-square distributions. In particular, if Z˜N(0,1) and <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and Z and W are independent, then

<img _fckfakelement="true" _fck_mw_math="\frac{Z}{\sqrt{W/k}} \sim t_{k} " src="/images/math/a/a/2/aa26d30be9d0c50902042558dcf5f532.png" />

That is, the ratio of an N(0,1) variable and a (properly transformed) chi-squared variable follows a t distribution, with d.f.\ equal to the number of d.f.\ of the chi-squared variable. Of course, this also means that <img _fckfakelement="true" _fck_mw_math="\frac{Z^{2}}{W/k} \sim t_{k}." src="/images/math/6/c/c/6cc32b16a52a7fcdaf5f98e77177b6b2.png" />

Since we know that <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />, this means that another derivation of the t distribution is as a ratio of a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variate and a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variate.


Additional points needed on the t distribution

<a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)

  • May want to note that it is ubiquitous in the inference on regression coefficients
  • Might want to note somewhere---this might go earlier in the discussion of df---that in most social science research (e.g. survey research and time-series cross-sections), the sample sizes are well above the point where the t is asymtotically normal. The t is actually important only in very small samples, though these can be found in situations such as small subsamples in survey research (are Hispanic ferret owners in Wyoming more likely to support the Tea Party?) and situations where the population itself is small (e.g. state membership in the EU, Latin America, or ECOWAS), and experiments with a small number of subjects or cases (this is commonly found in medical research, for example, and this also motivated Gossett's original development of the test, albeit with yeast and hops---we presume---rather than experimental subjects.). In these instances, using the conventional normal approximation to the t---in particular, the rule-of-thumb of looking for standard errors less than twice the size of the coefficient estimate to establish two-tailed 0.05 significance---will be misleading.

The F Distribution

An F distribution is the ratio of two chi-squared variates. If W1 and W2 are independent and <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{\ell}" src="/images/math/3/f/c/3fc30391893e01b4c49b1a8ec41b574c.png" />, respectively, then <img _fckfakelement="true" _fck_mw_math="\frac{W_{1}}{W_{2}} \sim F_{k,\ell} " src="/images/math/0/5/1/05164dba515bcc3a894d2ce3731268cc.png" />

That is, the ratio of two chi-squared variables is distributed as F with d.f.\ equal to the number of d.f.\ in the numerator and denominator variables, respectively.

Formally, if X is distributed as F with k and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom, then the PDF of X is:

<img _fckfakelement="true" _fck_mw_math=" f(x) = \frac{\left(\frac{k\,x}{k\,x + \ell}\right)^{k/2} \left(1-\frac{k\,x}{k\,x + \ell}\right)^{\ell/2}}{x\; \mathrm{B}(k/2, \ell/2)} " src="/images/math/7/6/8/76893d623336cf8f6974dd4a40735ec7.png" />


where <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(\cdot)" src="/images/math/c/2/d/c2d3433e3640c11e1f072c4006e17c11.png" /> is the ``beta function. That is, <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(x,y) = \int_0^1t^{x-1}(1-t)^{y-1}\,dt" src="/images/math/0/6/8/0689adb68c7ec29099fc40c34aa0dad5.png" />.} We write <img _fckfakelement="true" _fck_mw_math="X \sim F_{k,\ell}" src="/images/math/a/2/c/a2c04e5e0527e6a1156e6bbc58d89c7f.png" />, and say ``X is distributed as F with k and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom. \\}

The F is a two-parameter distribution, with degrees of freedom parameters (say k and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />), both of which are limited to the positive integers. An F variate X takes values only on the non-negative real line; it has expected value equal to <img _fckfakelement="true" _fck_mw_math="\text{E}(X) = \frac{\ell}{\ell - 2}," src="/images/math/8/d/a/8dab1cce0da2d33b88c188a1cab3c153.png" /> which implies that the mean of an F-distributed variable converges on 1.0 as <img _fckfakelement="true" _fck_mw_math="\ell \rightarrow \infty" src="/images/math/d/1/3/d132d0a78b8c0819b6187998c23cd1fb.png" />. Likewise, it has variance

<img _fckfakelement="true" _fck_mw_math="\text{Var}(X) = \frac{2\,\ell^2\,(k+\ell-2)}{k (\ell-2)^2 (\ell-4)}, " src="/images/math/3/5/1/35190b821a12585a7f4b779386f719ac.png" /> which bears no simple relationship to either k or <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />.

The F distribution is (generally) positively skewed. Examples of some F densities with different values of k and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> are presented in the figure below.


<img src="/images/thumb/4/46/StatDist.FDists.png/512px-StatDist.FDists.png" _fck_mw_filename="StatDist.FDists.png" _fck_mw_width="512" alt="StatDist.FDists.png" />

If <img _fckfakelement="true" _fck_mw_math="X \sim F(k, \ell)" src="/images/math/3/0/a/30a268520cf8c7ecc505b7280344f1b2.png" />, then <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} \sim F(\ell, k)" src="/images/math/c/2/b/c2b8fcb83c04471d8606d0cb75b97fbe.png" /> (because <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} = \frac{1}{(W_{1} / W_{2})} = \frac{W_{2}}{W_{1}}" src="/images/math/0/c/b/0cbf35f2c3c12afa441855bf91a2185d.png" />). In addition, the square of a t distributed variable is ˜F(1,k) (\textit{why}? -- take the formula for t, and square it...)

Additional points needed on the F distribution

<a href="User:Philip Schrodt">Philip Schrodt</a> 10:00, 13 July 2011 (PDT)

  • Discovered by Fisher in 1922, hence "F"
  • Mention how it will be used for R2 and ANOVA Failed to parse (syntax error): F = MS_\frac{{between},MS_{within}}
  • Square of a tk statistic is an F1,k statistic

Summary: Relationships Among Continuous Distributions

The substantive importance of all these distributions will become apparent as we move on to sampling distributions and statistical inference. In the meantime, it is useful to consider the relationship between the four distributions we discussion above

<img src="/images/thumb/2/2e/Continuous.dists.png/512px-Continuous.dists.png" _fck_mw_filename="Continuous.dists.png" _fck_mw_width="512" alt="Continuous.dists.png" />

References

<references group=""></references>

Discussion questions

Problems

=Glossary=

  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]

__FORCETOC__