## Difference between revisions of "Statistical distributions"

### From OPOSSEM

m (→The F Distribution) |
m (→The F Distribution) |
||

Line 1: | Line 1: | ||

− | <!-- | + | <p><!-- |

1. At the moment [11.07.07] I'm just putting the equations in-line, since I'm converting this from LaTeX. I'll move the equations to references later | 1. At the moment [11.07.07] I'm just putting the equations in-line, since I'm converting this from LaTeX. I'll move the equations to references later | ||

2. Images courtesy of Chris Zorn | 2. Images courtesy of Chris Zorn | ||

--> | --> | ||

− | + | </p><p><br /> | |

− | + | </p> | |

− | + | <h1>Objectives</h1> | |

− | + | <ul><li> | |

− | + | </li><li> | |

− | + | </li><li> | |

− | + | </li><li> | |

− | + | </li></ul> | |

− | + | <h1>Introduction</h1> | |

− | + | <p>In the previous chapter we discussed probability theory, which we expressed in terms of a variable $X$. We defined $X$ as a set of realizations of some process, which in turn is governed by rules of probability regarding potential outcomes in the sample space. | |

− | + | </p><p>The variables we were talking about have been what are called <i>random variables</i>, which means that they have a probability distribution. As we noted before, broadly speaking, there are two kinds of random variables: <i>discrete</i> and <i>continuous</i>. | |

− | The variables we were talking about have been what are called | + | </p><p><i>Discrete</i> variables can take on any one of several distinct, mutually-exclusive values. |

− | + | </p> | |

− | + | <ul><li> Congressperson's ideology score {0,1,2,3...,100} | |

− | + | </li><li> An individual's political affiliation (Democrat, Republican, Independent} | |

− | + | </li><li> Whether or not a country is a member of European Union (true/false) | |

− | + | </li></ul> | |

− | + | <p>A <i>Continuous</i> variable can take on <i>any</i> value in its range. | |

− | A | + | </p> |

− | + | <ul><li> Individual income | |

− | + | </li><li> National population | |

− | + | </li></ul> | |

− | This chapter focuses on a family of continuous distributions that are the most widely used in statistical inference, and are found in a wide variety of contexts, both applied and theoretical. The < | + | <p>This chapter focuses on a family of continuous distributions that are the most widely used in statistical inference, and are found in a wide variety of contexts, both applied and theoretical. The <span class="texhtml"><i>N</i><i>o</i><i>r</i><i>m</i><i>a</i><i>l</i></span> distribution is the well-known "bell-shaped curve" that most students usually encounter first in the artificial context of academic testing, but due to a powerful result called the Central Limit Theorem, occurs in a wide variety of uncontrolled situations where the value of a random variables is determined by the average effect of a large number of random variables with any combination of distributions. The <span class="texhtml">χ<sup>2</sup></span>, <span class="texhtml"><i>t</i></span> and <span class="texhtml"><i>F</i></span> distributions can be derived from various products of normally-distributed variables, and are used extensively in statistical inference and applied statistics, so it's useful to understand them in a bit of depth. |

− | + | </p> | |

− | + | <h2>Need to do</h2> | |

− | + | <p><a href="User:Philip Schrodt">Philip Schrodt</a> 06:57, 13 July 2011 (PDT) | |

− | + | </p> | |

− | + | <ul><li>Probably need to get most of the probability chapter---which are the moment hasn't been started---written before this one. In particular, will the pdf and cdf be defined there or here? | |

− | + | </li><li>Add some of the discrete distributions, particularly the binomial | |

− | + | </li><li>Add the uniform? | |

− | + | </li><li>Do we add---or link to on another page---the derivation of the mean and standard errors for these: that code is available in CCL on an assortment of places on the web | |

− | + | </li></ul> | |

− | + | <h1> The Normal Distribution </h1> | |

− | We are all used to seeing normal distributions described, and to hearing that something is "normally distributed." We know that a normal distribution is "bell-shaped," and symmetrical, and probably that it has some mean and some standard deviation. | + | <p>We are all used to seeing normal distributions described, and to hearing that something is "normally distributed." We know that a normal distribution is "bell-shaped," and symmetrical, and probably that it has some mean and some standard deviation. |

− | + | </p><p>Formally, if <span class="texhtml"><i>X</i></span> is a <i>normally distributed</i> variate with mean <span class="texhtml">μ</span> and variance <span class="texhtml">σ<sup>2</sup></span>, then: | |

− | Formally, if < | + | </p><p><img _fckfakelement="true" _fck_mw_math="f(x) = \frac{1}{\sigma \sqrt{2\pi}} \text{exp} \left( - \frac{(x - \mu)^{2}}{2 \sigma^{2}} \right)" src="/images/math/0/5/c/05c01fdab44e6d59e0edc24028e1206a.png" />. |

− | + | </p><p><br /> | |

− | < | + | We denote this <span class="texhtml"><i>X</i>˜<i>N</i>(μ,σ<sup>2</sup>)</span>, and say ``<span class="texhtml"><i>X</i></span> is distributed normally with mean mu and variance sigma squared.<i> The symbol <span class="texhtml">φ</span> is often used as a shorthand to represent the normal density in \eqref{normalden}:</i> |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math="X \sim \phi_{\mu, \sigma^{2}}" src="/images/math/d/6/4/d64347ee6feb5ed546c6c65e3674dfb5.png" />. | |

− | + | </p><p>The corresponding normal CDF -- which is the probability of a normal random variate taking on a value less than or equal to some specified number -- is (as always) the indefinite integral of \eqref{normalden}. This has no simple closed-form solution, so we typically just write: | |

− | We denote this < | + | </p><p><img _fckfakelement="true" _fck_mw_math="F(x) \equiv \Phi_{\mu, \sigma^{2}}(x) = \int \phi_{\mu, \sigma^{2}} f(x) d x." src="/images/math/4/b/0/4b02e1a28b9bbc37b9bec48dcc04b239.png" /> |

− | + | </p><p>Here are a bunch of normal curves | |

− | < | + | </p><p><img src="/images/thumb/5/55/StatDist.Normals.png/512px-StatDist.Normals.png" _fck_mw_filename="StatDist.Normals.png" _fck_mw_width="512" alt="StatDist.Normals.png" /> |

− | + | </p><p><br /> | |

− | The corresponding normal CDF -- which is the probability of a normal random variate taking on a value less than or equal to some specified number -- is (as always) the indefinite integral of \eqref{normalden}. This has no simple closed-form solution, so we typically just write: | + | <img src="/images/thumb/b/b7/StatDist.NormalCDFs.png/512px-StatDist.NormalCDFs.png" _fck_mw_filename="StatDist.NormalCDFs.png" _fck_mw_width="512" alt="Normal cumulative distribution functions" /> |

− | + | </p> | |

− | < | + | <h2> Bases for the Normal Distribution </h2> |

− | + | <p>The most common justification for the normal distribution has its roots in the 'central limit theorem'. Consider <span class="texhtml"><i>i</i> = 1,2,...<i>N</i></span> independent, real-valued random variates $<span class="texhtml"><i>X</i><sub><i>i</i></sub></span>$, each with finite mean $<span class="texhtml">μ<sub><i>i</i></sub></span>$ and variance <img _fckfakelement="true" _fck_mw_math="\sigma^{2}_{i} > 0" src="/images/math/8/e/3/8e3e18b02caaa63b535d363869d670c9.png" />. If we consider a new variable $<span class="texhtml"><i>X</i></span>$ defined as the sum of these variables: | |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math="X = \sum_{i=1}^{N} X_{i}" src="/images/math/a/7/5/a752c37c7aaf9b5d42055a00f9b5fd37.png" /> | |

− | + | </p><p>then we know that | |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" \text{E}(X) = \sum_{i=1}^{N} \mu_{i} " src="/images/math/d/a/7/da7eb476d715554622bbefea75301103.png" /> | |

− | + | </p><p>and | |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" \text{Var}(X) = \sum_{i=1}^{N} \sigma^{2}_{i} " src="/images/math/7/7/d/77de1add6aafc0249d728571a9683b18.png" /> | |

− | + | </p><p><br /> | |

− | |||

− | == | ||

− | |||

− | |||

− | |||

− | < | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | < | ||

− | |||

− | |||

The central limit theorem states that: | The central limit theorem states that: | ||

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" \underset{N \rightarrow \infty}{\lim} X = \underset{N \rightarrow \infty}{\lim} \sum_{i=1}^{N} X_{i} \overset{D}{\rightarrow} N(\cdot) " src="/images/math/8/a/1/8a127d6ed5c5e2dd71ebdf34d8682057.png" /> | |

− | < | + | </p><p>where the notation <img _fckfakelement="true" _fck_mw_math="\overset{D}{\rightarrow}" src="/images/math/0/9/3/0931fec8f6726354023e382d5c71be2c.png" /> indicates convergence in distribution. That is, as <span class="texhtml"><i>N</i></span> gets sufficiently large, the distribution of the sum of <span class="texhtml"><i>N</i></span> independent random variates with finite mean and variance will converge to a normal distribution. As such, we often think of a normal distribution as being appropriate when the observed variable <span class="texhtml"><i>X</i></span> can take on a range of continuous values, and when the observed value of <span class="texhtml"><i>X</i></span> can be thought of as the product of a large number of relatively small, independent ``shocks<i> or perturbations.</i> |

− | + | </p> | |

− | where the notation < | + | <h2> Properties of the Normal Distribution </h2> |

− | + | <ul><li> A normal variate <span class="texhtml"><i>X</i></span> has support in <img _fckfakelement="true" _fck_mw_math="\mathfrak{R}" src="/images/math/6/1/0/610bc52ec5a62efd154a01deb92a0d5c.png" />. | |

− | + | </li><li> The normal is a two-parameter distribution, where <img _fckfakelement="true" _fck_mw_math="\mu \in (-\infty, \infty)" src="/images/math/8/9/0/8907d5e8f9cb4328f76778e16b69fba7.png" /> and <img _fckfakelement="true" _fck_mw_math="\sigma^{2} \in (0, \infty)" src="/images/math/c/1/0/c100ff23356ea61cfd105a64d8774c53.png" />. | |

− | + | </li><li> The normal distribution is always symmetrical (<span class="texhtml"><i>M</i><sub>3</sub> = 0</span>) and mesokurtic. | |

− | + | </li><li> item The normal distribution is preserved under a linear transformation. That is, if <span class="texhtml"><i>X</i>˜<i>N</i>(μ,σ<sup>2</sup>)</span>, then <span class="texhtml"><i>a</i><i>X</i> + <i>b</i>˜<i>N</i>(<i>a</i>μ + <i>b</i>,<i>a</i><sup>2</sup>σ<sup>2</sup>)</span>. (Why? Recall our earlier results on <span class="texhtml">μ</span> and <span class="texhtml">σ<sup>2</sup></span>). | |

− | + | </li></ul> | |

− | + | <p><br /> | |

− | + | </p> | |

− | + | <h2> The Standard Normal Distribution </h2> | |

− | + | <p>One linear transformation is especially useful: | |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" \begin{align} b & = \frac{-\mu}{\sigma} \\ a & = \frac{1}{\sigma} \end{align} " src="/images/math/8/e/e/8ee8272b17591d286ca783dd0a7b5dd0.png" />. | |

− | + | </p><p><br /> | |

− | |||

− | One linear transformation is especially useful: | ||

− | |||

− | < | ||

− | \begin{align} | ||

− | |||

− | |||

− | \end{align} | ||

− | |||

− | |||

− | |||

This yields: | This yields: | ||

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" \begin{align} ax + b & \sim N(a\mu+b, a^{2} \sigma^{2}) \\ & \sim N(0,1) \end{align} " src="/images/math/0/4/0/040ae3bbba3a7ab522dca56833ad2722.png" /> | |

− | < | + | </p><p>This is the <i>standard normal density function</i>. We often denote this <img _fckfakelement="true" _fck_mw_math="\phi(\cdot)" src="/images/math/5/2/d/52d16e95602c985d5f23b36ddc663415.png" />, and say that "X is distributed as standard normal." We can also get this by transforming ("standardizing") the normal variate <span class="texhtml"><i>X</i></span>... |

− | \begin{align} | + | </p> |

− | + | <ul><li> If <span class="texhtml"><i>X</i>˜<i>N</i>(μ,σ<sup>2</sup>)</span>, then <img _fckfakelement="true" _fck_mw_math="Z = \frac{(x - \mu)}{\sigma} \sim N(0,1)" src="/images/math/8/1/e/81e060afad46dc085b84bc31bf94454c.png" />. | |

− | + | </li><li>The density function then reduces to: | |

− | \end{align} | + | </li></ul> |

− | + | <p><br /> | |

− | + | <img _fckfakelement="true" _fck_mw_math=" f(z) = \equiv \phi(z) = \frac{1}{\sqrt{2\pi}} \text{exp} \left[ - \frac{(z)^{2}}{2} \right] " src="/images/math/e/3/f/e3f634ea319de07e04630c939d8e364f.png" /> | |

− | This is the | + | </p><p>Similarly, we often write the CDF for the standard normal as <img _fckfakelement="true" _fck_mw_math="\Phi(\cdot)" src="/images/math/8/9/d/89d767697c1931e19b576aef0e242f9b.png" />. |

− | + | </p> | |

− | + | <h2> Why do we care about the normal distribution? </h2> | |

− | + | <p>The normal distribution's importance lies in its relationship to the central limit theorem. As we'll discuss at more length later, the central limit theorem means that as one's sample size increases, the distribution of sample means (or other estimates) approaches a normal distribution. | |

− | + | </p> | |

− | < | + | <h2>Additional points needed on the normal </h2> |

− | + | <p><a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT) | |

− | + | </p> | |

− | + | <ul><li>More extended discussion of the CLT, and a note that if we are dealing with a data generating process where the "error" is the average (or cumulative) effect of a large number of random variables with a variety of distributions, the CLT tells us that the net effect will be normally distributed. This, in turn, explains why linear models that assume Normally distributed error---regression and ANOVA---have proven to be so robust in practice | |

− | + | </li><li>Link to a number of examples of normally distributed data...should be easy to find these on the web. E.g. the classical height. Maybe SAT scores, though these are artificially normal | |

− | + | </li><li>ref to the wikipedia article; there is also a nice graphic to snag from there---introductory sidebar---which shows the standard normal | |

− | The normal distribution's importance lies in its relationship to the central limit theorem. As we'll discuss at more length later, the central limit theorem means that as one's sample size increases, the distribution of sample means (or other estimates) approaches a normal distribution. | + | </li><li>sidebar on the log-normal? |

− | + | </li><li>something about the bivariate normal and some nice graphics of this? | |

− | + | </li><li>sidebar on the issue of fat tails and how these destroyed the economy in 2007?---there is a fairly readable Wired article on this: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant | |

− | + | </li></ul> | |

− | + | <h1> The <span class="texhtml">χ<sup>2</sup></span> Distribution </h1> | |

− | + | <p>The chi-square (<span class="texhtml">χ<sup>2</sup></span>) distribution is a one-parameter distribution defined only how positive values. If <span class="texhtml"><i>Z</i>˜<i>N</i>(0,1)</span>, then <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />. That is, <i>the square</i> of a <span class="texhtml"><i>N</i>(0,1)</span> variable is chi-squared with one degree of freedom. The fact that the square of a standard normal variate is a one-degree-of-freedom chi-square variable also explains why (e.g.) a chi-squared variate is only defined for nonnegative real numbers. If <span class="texhtml"><i>W</i><sub>1</sub>,<i>W</i><sub>2</sub>,...<i>W</i><sub><i>k</i></sub></span> are all independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variables, then <img _fckfakelement="true" _fck_mw_math="\sum_{i=1}^{k}W_{i} \sim \chi^{2}_{k}" src="/images/math/f/2/8/f2879b38c36a66d597e5669963650a44.png" />. (The sum of <span class="texhtml"><i>k</i></span> independent chi-squared variables is chi-squared with <span class="texhtml"><i>k</i></span> degrees of freedom). By extension, the sum of the squares of <span class="texhtml"><i>k</i></span> independent <span class="texhtml"><i>N</i>(0,1)</span> variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />. | |

− | + | </p><p>The <span class="texhtml">χ<sup>2</sup></span> distribution is positively skewed, with <span class="texhtml">E(<i>W</i>) = <i>k</i></span> and | |

− | + | <span class="texhtml">Var(<i>W</i>) = 2<i>k</i>.</span> | |

− | + | </p><p>Figure below presents five <span class="texhtml">χ<sup>2</sup></span> densities with different values of <span class="texhtml"><i>k</i></span>. | |

− | + | </p><p><img src="/images/thumb/8/89/StatDist.ChiSquares.png/512px-StatDist.ChiSquares.png" _fck_mw_filename="StatDist.ChiSquares.png" _fck_mw_width="512" alt="StatDist.ChiSquares.png" /> | |

− | + | </p> | |

− | + | <pre class="_fck_mw_lspace">Need to define degrees of freedom here | |

− | + | </pre> | |

− | + | <h3> Characteristics of the <span class="texhtml">χ<sup>2</sup></span> Distribution </h3> | |

− | + | <p>If <span class="texhtml"><i>W</i><sub><i>j</i></sub></span> and <span class="texhtml"><i>W</i><sub><i>k</i></sub></span> are independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{j}" src="/images/math/6/b/0/6b0005e43b70a25520c6a6abc4d0ea47.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variables, respectively, then <span class="texhtml"><i>W</i><sub><i>j</i></sub> + <i>W</i><sub><i>k</i></sub></span> is <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{j+k}" src="/images/math/1/0/c/10cddb6bd869d6187a32e742d90a0891.png" />; this result can be extended to any number of independent chi-squared variables. This in turn implies the result the sum of the squares of <span class="texhtml"><i>k</i></span> independent <span class="texhtml"><i>N</i>(0,1)</span> variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" /> | |

− | + | </p><p><br /> | |

− | + | </p> | |

− | The chi-square (< | + | <h2>Derivation of the <span class="texhtml">χ<sup>2</sup></span> from Gamma functions</h2> |

− | + | <p>Gill discusses the <span class="texhtml">χ<sup>2</sup></span> distribution as a special case of the gamma PDF. That's fine, but there's actually a much more intuitive way of thinking about it, and one that comports more closely with how it is (most commonly) used in statistics. Formally, a variable <span class="texhtml"><i>W</i></span> that is distributed as <span class="texhtml">χ<sup>2</sup></span> with <span class="texhtml"><i>k</i></span> degrees of freedom has a density of: | |

− | The < | + | </p><p><img _fckfakelement="true" _fck_mw_math="\begin{align} f(w) &=& \frac{1}{2^{k} \Gamma(k)} w^{k} \text{exp} \left[ \frac{-w}{2} \right] \\ &=& \frac{w^{\frac{k-2}{2}} \exp(\frac{-w}{2})}{2^{\frac{k}{2}} \Gamma(\frac{k}{2})} \end{align} " src="/images/math/5/1/5/5158ca285caaeda5490da80182369578.png" /> |

− | < | + | </p><p>where <img _fckfakelement="true" _fck_mw_math="\Gamma(k) = \int_{0}^{\infty} t^{k - 1} \text{exp}(-t) \, dt" src="/images/math/c/9/d/c9d5de0db64c93347442795b22a9f129.png" /> is the gamma integral (see, e.g., Gill, p.\ 222). As with the normal distribution, the need to write the distribution in this fashion reflects the fact that it has no closed-form solution. The corresponding CDF is |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" F(w)=\frac{\gamma(k/2,w/2)}{\Gamma(k/2)} " src="/images/math/1/3/d/13d302967fd94bd45df3aa569e5503f2.png" /> | |

− | + | </p><p>where <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is as before and <img _fckfakelement="true" _fck_mw_math="\gamma(\cdot)" src="/images/math/e/4/8/e4812a607c060d3c6b4680f1b884021d.png" /> is the \texttt{http://en.wikipedia.org/wiki/Incomplete\_Gamma\_function}{lower incomplete gamma function}. We write this\footnote{One also occasionally sees <span class="texhtml"><i>W</i>˜χ<sup>2</sup>(<i>k</i>)</span>, with the degrees of freedom in parentheses.} as <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and say ``<span class="texhtml"><i>W</i></span> is distributed as chi-squared with <span class="texhtml"><i>k</i></span> degrees of freedom.<i> \\</i> | |

− | + | </p> | |

− | + | <h2>Additional points needed on the chi-square </h2> | |

− | + | <p><a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT) | |

− | + | </p> | |

− | + | <ul><li>Probably want to mention the use in contingency tables here, since the connection isn't obvious. | |

− | + | </li><li>Agresti and Finlay state this was introduced by Pearson in 1900, apparently in the context of contingency tables---confirm this, any sort of story here? | |

− | + | </li><li>As df becomes very large, the chi-square approximates the normal; this is a asymptotic distribution and for practical purposes, can be used if df > 50 | |

− | + | </li><li>Discuss more about the assumption of statistical independence? | |

− | + | </li><li>Chi-square as the test for comparing whether an observed frequency fits a known distribution | |

− | + | </li></ul> | |

− | = | + | <h1> Student's <span class="texhtml"><i>t</i></span> Distribution </h1> |

− | + | <p>For a variable <span class="texhtml"><i>X</i></span> which is distributed as <span class="texhtml"><i>t</i></span> with <span class="texhtml"><i>k</i></span> degrees of freedom, the PDF function is: | |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" f(x) = \frac{\Gamma(\frac{k+1}{2})} {\sqrt{k\pi}\,\Gamma(\frac{k}{2})} \left(1+\frac{x^2}{k} \right)^{-(\frac{k+1}{2})}\! " src="/images/math/e/6/d/e6d0efa21a2ef9e5400f5d5cfdac879f.png" /> | |

− | + | </p><p>where once again <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is the gamma integral. We write <span class="texhtml"><i>X</i>˜<i>t</i><sub><i>k</i></sub></span>, and say ``<span class="texhtml"><i>X</i></span> is distributed as Student's <span class="texhtml"><i>t</i></span> with <span class="texhtml"><i>k</i></span> degrees of freedom.<i> The figure below presents <span class="texhtml"><i>t</i></span> densities for five different values of <span class="texhtml"><i>k</i></span>, along with a standard normal density for comparison.</i> | |

− | < | + | </p><p><br /> |

− | + | <img src="/images/thumb/d/d8/StatDist.tDists.png/512px-StatDist.tDists.png" _fck_mw_filename="StatDist.tDists.png" _fck_mw_width="512" alt="StatDist.tDists.png" /> | |

− | + | </p><p>The t-distribution is sometimes known as "Student's t", after a then-anonymous ``student<i> of the statistician Karl Pearson. The story, from Wikipedia,</i> | |

− | + | </p> | |

− | |||

− | |||

− | |||

− | < | ||

− | |||

− | </ | ||

− | |||

− | |||

− | |||

− | = | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | f( | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

<blockquote> | <blockquote> | ||

The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name). Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians. | The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name). Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians. | ||

Line 201: | Line 140: | ||

<!-- http://en.wikipedia.org/wiki/Student%27s_t-test --> | <!-- http://en.wikipedia.org/wiki/Student%27s_t-test --> | ||

− | Note a few things about < | + | <p>Note a few things about <span class="texhtml"><i>t</i></span>: |

− | + | </p> | |

− | + | <ul><li> The mean/mode/median of a <span class="texhtml"><i>t</i></span>-distributed variate is zero, and its variance is <img _fckfakelement="true" _fck_mw_math="\frac{k}{k - 2}" src="/images/math/d/7/5/d75c6ef4ee2b360ba8f65eb687e33f1e.png" />. | |

− | + | </li><li> <span class="texhtml"><i>t</i></span> looks like a standard normal distribution (symmetrical, bell-shaped) but has thicker ``tails<i> (read: higher probabilities of draws being relatively far from the mean/mode). However...</i> | |

− | + | </li><li> ...as <span class="texhtml"><i>k</i></span> gets larger, <span class="texhtml"><i>t</i></span> converges to a standard normal distribution; at or above <span class="texhtml"><i>k</i> = 30</span> or so, the two are effectively indistinguishable. | |

− | + | </li></ul> | |

− | The importance of the < | + | <p>The importance of the <span class="texhtml"><i>t</i></span> distribution lies in its relationship to the normal and chi-square distributions. In particular, if <span class="texhtml"><i>Z</i>˜<i>N</i>(0,1)</span> and <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and <span class="texhtml"><i>Z</i></span> and <span class="texhtml"><i>W</i></span> are independent, then |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math="\frac{Z}{\sqrt{W/k}} \sim t_{k} " src="/images/math/a/a/2/aa26d30be9d0c50902042558dcf5f532.png" /> | |

− | < | + | </p><p>That is, the ratio of an <span class="texhtml"><i>N</i>(0,1)</span> variable and a (properly transformed) chi-squared variable follows a <span class="texhtml"><i>t</i></span> distribution, with d.f.\ equal to the number of d.f.\ of the chi-squared variable. Of course, this also means that <img _fckfakelement="true" _fck_mw_math="\frac{Z^{2}}{W/k} \sim t_{k}." src="/images/math/6/c/c/6cc32b16a52a7fcdaf5f98e77177b6b2.png" /> |

− | + | </p><p>Since we know that <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />, this means that another derivation of the <span class="texhtml"><i>t</i></span> distribution is as a ratio of a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variate and a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variate. | |

− | That is, the ratio of an < | + | </p><p><br /> |

− | + | </p> | |

− | Since we know that < | + | <h2>Additional points needed on the t distribution </h2> |

− | + | <p><a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT) | |

− | + | </p> | |

− | + | <ul><li>May want to note that it is ubiquitous in the inference on regression coefficients | |

− | + | </li><li>Might want to note somewhere---this might go earlier in the discussion of df---that in most social science research (e.g. survey research and time-series cross-sections), the sample sizes are well above the point where the t is asymtotically normal. The t is actually important only in very small samples, though these can be found in situations such as small subsamples in survey research (are Hispanic ferret owners in Wyoming more likely to support the Tea Party?) and situations where the population itself is small (e.g. state membership in the EU, Latin America, or ECOWAS), and experiments with a small number of subjects or cases (this is commonly found in medical research, for example, and this also motivated Gossett's original development of the test, albeit with yeast and hops---we presume---rather than experimental subjects.). In these instances, using the conventional normal approximation to the t---in particular, the rule-of-thumb of looking for standard errors less than twice the size of the coefficient estimate to establish two-tailed 0.05 significance---will be misleading. | |

− | + | </li></ul> | |

− | + | <h1> The <span class="texhtml"><i>F</i></span> Distribution </h1> | |

− | + | <p>An <span class="texhtml"><i>F</i></span> distribution is the ratio of two chi-squared variates. If <span class="texhtml"><i>W</i><sub>1</sub></span> and <span class="texhtml"><i>W</i><sub>2</sub></span> are independent and <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{\ell}" src="/images/math/3/f/c/3fc30391893e01b4c49b1a8ec41b574c.png" />, respectively, then | |

− | + | <img _fckfakelement="true" _fck_mw_math="\frac{W_{1}}{W_{2}} \sim F_{k,\ell} " src="/images/math/0/5/1/05164dba515bcc3a894d2ce3731268cc.png" /> | |

− | + | </p><p>That is, the ratio of two chi-squared variables is distributed as <span class="texhtml"><i>F</i></span> with d.f.\ equal to the number of d.f.\ in the numerator and denominator variables, respectively. | |

− | + | </p><p>Formally, if <span class="texhtml"><i>X</i></span> is distributed as <span class="texhtml"><i>F</i></span> with <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom, then the PDF of <span class="texhtml"><i>X</i></span> is: | |

− | + | </p><p><img _fckfakelement="true" _fck_mw_math=" f(x) = \frac{\left(\frac{k\,x}{k\,x + \ell}\right)^{k/2} \left(1-\frac{k\,x}{k\,x + \ell}\right)^{\ell/2}}{x\; \mathrm{B}(k/2, \ell/2)} " src="/images/math/7/6/8/76893d623336cf8f6974dd4a40735ec7.png" /> | |

− | An < | + | </p><p><br /> |

− | < | + | where <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(\cdot)" src="/images/math/c/2/d/c2d3433e3640c11e1f072c4006e17c11.png" /> is the ``beta function. That is, <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(x,y) = \int_0^1t^{x-1}(1-t)^{y-1}\,dt" src="/images/math/0/6/8/0689adb68c7ec29099fc40c34aa0dad5.png" />.} We write <img _fckfakelement="true" _fck_mw_math="X \sim F_{k,\ell}" src="/images/math/a/2/c/a2c04e5e0527e6a1156e6bbc58d89c7f.png" />, and say ``<span class="texhtml"><i>X</i></span> is distributed as <span class="texhtml"><i>F</i></span> with <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom.<i> \\}</i> |

− | + | </p><p>The <span class="texhtml"><i>F</i></span> is a two-parameter distribution, with degrees of freedom parameters (say <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />), both of which are limited to the positive integers. An <span class="texhtml"><i>F</i></span> variate <span class="texhtml"><i>X</i></span> takes values only on the non-negative real line; it has expected value equal to <img _fckfakelement="true" _fck_mw_math="\text{E}(X) = \frac{\ell}{\ell - 2}," src="/images/math/8/d/a/8dab1cce0da2d33b88c188a1cab3c153.png" /> which implies that the mean of an <span class="texhtml"><i>F</i></span>-distributed variable converges on 1.0 as <img _fckfakelement="true" _fck_mw_math="\ell \rightarrow \infty" src="/images/math/d/1/3/d132d0a78b8c0819b6187998c23cd1fb.png" />. Likewise, it has variance | |

− | + | <img _fckfakelement="true" _fck_mw_math="\text{Var}(X) = \frac{2\,\ell^2\,(k+\ell-2)}{k (\ell-2)^2 (\ell-4)}, " src="/images/math/3/5/1/35190b821a12585a7f4b779386f719ac.png" /> which bears no simple relationship to either <span class="texhtml"><i>k</i></span> or <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />. | |

− | + | </p><p>The <span class="texhtml"><i>F</i></span> distribution is (generally) positively skewed. Examples of some <span class="texhtml"><i>F</i></span> densities with different values of <span class="texhtml"><i>k</i></span> and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> are presented in the figure below. | |

− | + | </p><p><br /> | |

− | + | <img src="/images/thumb/4/46/StatDist.FDists.png/512px-StatDist.FDists.png" _fck_mw_filename="StatDist.FDists.png" _fck_mw_width="512" alt="StatDist.FDists.png" /> | |

− | + | </p><p>If <img _fckfakelement="true" _fck_mw_math="X \sim F(k, \ell)" src="/images/math/3/0/a/30a268520cf8c7ecc505b7280344f1b2.png" />, then <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} \sim F(\ell, k)" src="/images/math/c/2/b/c2b8fcb83c04471d8606d0cb75b97fbe.png" /> (because <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} = \frac{1}{(W_{1} / W_{2})} = \frac{W_{2}}{W_{1}}" src="/images/math/0/c/b/0cbf35f2c3c12afa441855bf91a2185d.png" />). In addition, the square of a <span class="texhtml"><i>t</i></span> distributed variable is <span class="texhtml">˜<i>F</i>(1,<i>k</i>)</span> (\textit{why}? -- take the formula for <span class="texhtml"><i>t</i></span>, and square it...) | |

− | < | + | </p> |

− | + | <h2>Additional points needed on the F distribution </h2> | |

− | </ | + | <p><a href="User:Philip Schrodt">Philip Schrodt</a> 10:00, 13 July 2011 (PDT) |

− | + | </p> | |

− | + | <ul><li>Discovered by Fisher in 1922, hence "F" | |

− | + | </li><li>Mention how it will be used for <span class="texhtml"><i>R</i><sup>2</sup></span> and ANOVA <strong class='error'>Failed to parse (syntax error): F = MS_\frac{{between},MS_{within}}</strong> | |

− | + | </li></ul> | |

− | + | <ul><li>Square of a <span class="texhtml"><i>t</i><sub><i>k</i></sub></span> statistic is an <span class="texhtml"><i>F</i><sub>1,<i>k</i></sub></span> statistic | |

− | < | + | </li></ul> |

− | + | <h1> Summary: Relationships Among Continuous Distributions </h1> | |

− | + | <p>The substantive importance of all these distributions will become apparent as we move on to sampling distributions and statistical inference. In the meantime, it is useful to consider the relationship between the four distributions we discussion above | |

− | + | </p><p><img src="/images/thumb/2/2e/Continuous.dists.png/512px-Continuous.dists.png" _fck_mw_filename="Continuous.dists.png" _fck_mw_width="512" alt="Continuous.dists.png" /> | |

− | + | </p><p> | |

− | |||

− | |||

− | |||

− | |||

− | = | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

<!--DO NOT EDIT THE REFERENCE SECTION--> | <!--DO NOT EDIT THE REFERENCE SECTION--> | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | =Problems | + | </p> |

− | + | <h1>References</h1> | |

− | + | <p><span class="fck_mw_template">{{Reflist}}</span> | |

− | + | </p> | |

− | + | <h1>Discussion questions</h1> | |

− | + | <ol><li> | |

− | + | </li><li> | |

− | =Glossary= | + | </li><li> |

+ | </li><li> | ||

+ | </li><li> | ||

+ | </li></ol> | ||

+ | <h1>Problems</h1> | ||

+ | <ol><li> | ||

+ | </li><li> | ||

+ | </li><li> | ||

+ | </li><li> | ||

+ | </li><li> | ||

+ | </li></ol> | ||

+ | <p>=Glossary= | ||

<!-- Here add any keywords or terms introduced on this page. Add them in a list like: | <!-- Here add any keywords or terms introduced on this page. Add them in a list like: | ||

:*[[Def:newterm1]] | :*[[Def:newterm1]] | ||

Line 289: | Line 211: | ||

:*[[Def:newterm3]] | :*[[Def:newterm3]] | ||

Do not edit above this line.--> | Do not edit above this line.--> | ||

− | |||

− | |||

− | |||

+ | </p> | ||

+ | <dl><dd><ul><li>[[Def: ]] | ||

+ | </li><li>[[Def: ]] | ||

+ | </li><li>[[Def: ]] | ||

+ | </li></ul> | ||

+ | </dd></dl> | ||

+ | <p> | ||

<!--Do not edit below this line.--> | <!--Do not edit below this line.--> | ||

− | + | ||

+ | </p> | ||

+ | <pre class="_fck_mw_lspace">__FORCETOC__ | ||

+ | </pre> |

## Revision as of 08:01, 13 July 2011

## Contents

# Objectives

# Introduction

In the previous chapter we discussed probability theory, which we expressed in terms of a variable $X$. We defined $X$ as a set of realizations of some process, which in turn is governed by rules of probability regarding potential outcomes in the sample space.

The variables we were talking about have been what are called *random variables*, which means that they have a probability distribution. As we noted before, broadly speaking, there are two kinds of random variables: *discrete* and *continuous*.

*Discrete* variables can take on any one of several distinct, mutually-exclusive values.

- Congressperson's ideology score {0,1,2,3...,100}
- An individual's political affiliation (Democrat, Republican, Independent}
- Whether or not a country is a member of European Union (true/false)

A *Continuous* variable can take on *any* value in its range.

- Individual income
- National population

This chapter focuses on a family of continuous distributions that are the most widely used in statistical inference, and are found in a wide variety of contexts, both applied and theoretical. The *N**o**r**m**a**l* distribution is the well-known "bell-shaped curve" that most students usually encounter first in the artificial context of academic testing, but due to a powerful result called the Central Limit Theorem, occurs in a wide variety of uncontrolled situations where the value of a random variables is determined by the average effect of a large number of random variables with any combination of distributions. The χ^{2}, *t* and *F* distributions can be derived from various products of normally-distributed variables, and are used extensively in statistical inference and applied statistics, so it's useful to understand them in a bit of depth.

## Need to do

<a href="User:Philip Schrodt">Philip Schrodt</a> 06:57, 13 July 2011 (PDT)

- Probably need to get most of the probability chapter---which are the moment hasn't been started---written before this one. In particular, will the pdf and cdf be defined there or here?
- Add some of the discrete distributions, particularly the binomial
- Add the uniform?
- Do we add---or link to on another page---the derivation of the mean and standard errors for these: that code is available in CCL on an assortment of places on the web

# The Normal Distribution

We are all used to seeing normal distributions described, and to hearing that something is "normally distributed." We know that a normal distribution is "bell-shaped," and symmetrical, and probably that it has some mean and some standard deviation.

Formally, if *X* is a *normally distributed* variate with mean μ and variance σ^{2}, then:

<img _fckfakelement="true" _fck_mw_math="f(x) = \frac{1}{\sigma \sqrt{2\pi}} \text{exp} \left( - \frac{(x - \mu)^{2}}{2 \sigma^{2}} \right)" src="/images/math/0/5/c/05c01fdab44e6d59e0edc24028e1206a.png" />.

We denote this *X*˜*N*(μ,σ^{2}), and say ``*X* is distributed normally with mean mu and variance sigma squared.* The symbol φ is often used as a shorthand to represent the normal density in \eqref{normalden}:*

<img _fckfakelement="true" _fck_mw_math="X \sim \phi_{\mu, \sigma^{2}}" src="/images/math/d/6/4/d64347ee6feb5ed546c6c65e3674dfb5.png" />.

The corresponding normal CDF -- which is the probability of a normal random variate taking on a value less than or equal to some specified number -- is (as always) the indefinite integral of \eqref{normalden}. This has no simple closed-form solution, so we typically just write:

<img _fckfakelement="true" _fck_mw_math="F(x) \equiv \Phi_{\mu, \sigma^{2}}(x) = \int \phi_{\mu, \sigma^{2}} f(x) d x." src="/images/math/4/b/0/4b02e1a28b9bbc37b9bec48dcc04b239.png" />

Here are a bunch of normal curves

<img src="/images/thumb/5/55/StatDist.Normals.png/512px-StatDist.Normals.png" _fck_mw_filename="StatDist.Normals.png" _fck_mw_width="512" alt="StatDist.Normals.png" />

<img src="/images/thumb/b/b7/StatDist.NormalCDFs.png/512px-StatDist.NormalCDFs.png" _fck_mw_filename="StatDist.NormalCDFs.png" _fck_mw_width="512" alt="Normal cumulative distribution functions" />

## Bases for the Normal Distribution

The most common justification for the normal distribution has its roots in the 'central limit theorem'. Consider *i* = 1,2,...*N* independent, real-valued random variates $*X*_{i}$, each with finite mean $μ_{i}$ and variance <img _fckfakelement="true" _fck_mw_math="\sigma^{2}_{i} > 0" src="/images/math/8/e/3/8e3e18b02caaa63b535d363869d670c9.png" />. If we consider a new variable $*X*$ defined as the sum of these variables:

<img _fckfakelement="true" _fck_mw_math="X = \sum_{i=1}^{N} X_{i}" src="/images/math/a/7/5/a752c37c7aaf9b5d42055a00f9b5fd37.png" />

then we know that

<img _fckfakelement="true" _fck_mw_math=" \text{E}(X) = \sum_{i=1}^{N} \mu_{i} " src="/images/math/d/a/7/da7eb476d715554622bbefea75301103.png" />

and

<img _fckfakelement="true" _fck_mw_math=" \text{Var}(X) = \sum_{i=1}^{N} \sigma^{2}_{i} " src="/images/math/7/7/d/77de1add6aafc0249d728571a9683b18.png" />

The central limit theorem states that:

<img _fckfakelement="true" _fck_mw_math=" \underset{N \rightarrow \infty}{\lim} X = \underset{N \rightarrow \infty}{\lim} \sum_{i=1}^{N} X_{i} \overset{D}{\rightarrow} N(\cdot) " src="/images/math/8/a/1/8a127d6ed5c5e2dd71ebdf34d8682057.png" />

where the notation <img _fckfakelement="true" _fck_mw_math="\overset{D}{\rightarrow}" src="/images/math/0/9/3/0931fec8f6726354023e382d5c71be2c.png" /> indicates convergence in distribution. That is, as *N* gets sufficiently large, the distribution of the sum of *N* independent random variates with finite mean and variance will converge to a normal distribution. As such, we often think of a normal distribution as being appropriate when the observed variable *X* can take on a range of continuous values, and when the observed value of *X* can be thought of as the product of a large number of relatively small, independent ``shocks* or perturbations.*

## Properties of the Normal Distribution

- A normal variate
*X*has support in <img _fckfakelement="true" _fck_mw_math="\mathfrak{R}" src="/images/math/6/1/0/610bc52ec5a62efd154a01deb92a0d5c.png" />. - The normal is a two-parameter distribution, where <img _fckfakelement="true" _fck_mw_math="\mu \in (-\infty, \infty)" src="/images/math/8/9/0/8907d5e8f9cb4328f76778e16b69fba7.png" /> and <img _fckfakelement="true" _fck_mw_math="\sigma^{2} \in (0, \infty)" src="/images/math/c/1/0/c100ff23356ea61cfd105a64d8774c53.png" />.
- The normal distribution is always symmetrical (
*M*_{3}= 0) and mesokurtic. - item The normal distribution is preserved under a linear transformation. That is, if
*X*˜*N*(μ,σ^{2}), then*a**X*+*b*˜*N*(*a*μ +*b*,*a*^{2}σ^{2}). (Why? Recall our earlier results on μ and σ^{2}).

## The Standard Normal Distribution

One linear transformation is especially useful:

<img _fckfakelement="true" _fck_mw_math=" \begin{align} b & = \frac{-\mu}{\sigma} \\ a & = \frac{1}{\sigma} \end{align} " src="/images/math/8/e/e/8ee8272b17591d286ca783dd0a7b5dd0.png" />.

This yields:

<img _fckfakelement="true" _fck_mw_math=" \begin{align} ax + b & \sim N(a\mu+b, a^{2} \sigma^{2}) \\ & \sim N(0,1) \end{align} " src="/images/math/0/4/0/040ae3bbba3a7ab522dca56833ad2722.png" />

This is the *standard normal density function*. We often denote this <img _fckfakelement="true" _fck_mw_math="\phi(\cdot)" src="/images/math/5/2/d/52d16e95602c985d5f23b36ddc663415.png" />, and say that "X is distributed as standard normal." We can also get this by transforming ("standardizing") the normal variate *X*...

- If
*X*˜*N*(μ,σ^{2}), then <img _fckfakelement="true" _fck_mw_math="Z = \frac{(x - \mu)}{\sigma} \sim N(0,1)" src="/images/math/8/1/e/81e060afad46dc085b84bc31bf94454c.png" />. - The density function then reduces to:

<img _fckfakelement="true" _fck_mw_math=" f(z) = \equiv \phi(z) = \frac{1}{\sqrt{2\pi}} \text{exp} \left[ - \frac{(z)^{2}}{2} \right] " src="/images/math/e/3/f/e3f634ea319de07e04630c939d8e364f.png" />

Similarly, we often write the CDF for the standard normal as <img _fckfakelement="true" _fck_mw_math="\Phi(\cdot)" src="/images/math/8/9/d/89d767697c1931e19b576aef0e242f9b.png" />.

## Why do we care about the normal distribution?

The normal distribution's importance lies in its relationship to the central limit theorem. As we'll discuss at more length later, the central limit theorem means that as one's sample size increases, the distribution of sample means (or other estimates) approaches a normal distribution.

## Additional points needed on the normal

<a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)

- More extended discussion of the CLT, and a note that if we are dealing with a data generating process where the "error" is the average (or cumulative) effect of a large number of random variables with a variety of distributions, the CLT tells us that the net effect will be normally distributed. This, in turn, explains why linear models that assume Normally distributed error---regression and ANOVA---have proven to be so robust in practice
- Link to a number of examples of normally distributed data...should be easy to find these on the web. E.g. the classical height. Maybe SAT scores, though these are artificially normal
- ref to the wikipedia article; there is also a nice graphic to snag from there---introductory sidebar---which shows the standard normal
- sidebar on the log-normal?
- something about the bivariate normal and some nice graphics of this?
- sidebar on the issue of fat tails and how these destroyed the economy in 2007?---there is a fairly readable Wired article on this: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant

# The χ^{2} Distribution

The chi-square (χ^{2}) distribution is a one-parameter distribution defined only how positive values. If *Z*˜*N*(0,1), then <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />. That is, *the square* of a *N*(0,1) variable is chi-squared with one degree of freedom. The fact that the square of a standard normal variate is a one-degree-of-freedom chi-square variable also explains why (e.g.) a chi-squared variate is only defined for nonnegative real numbers. If *W*_{1},*W*_{2},...*W*_{k} are all independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variables, then <img _fckfakelement="true" _fck_mw_math="\sum_{i=1}^{k}W_{i} \sim \chi^{2}_{k}" src="/images/math/f/2/8/f2879b38c36a66d597e5669963650a44.png" />. (The sum of *k* independent chi-squared variables is chi-squared with *k* degrees of freedom). By extension, the sum of the squares of *k* independent *N*(0,1) variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />.

The χ^{2} distribution is positively skewed, with E(*W*) = *k* and

Var(*W*) = 2*k*.

Figure below presents five χ^{2} densities with different values of *k*.

<img src="/images/thumb/8/89/StatDist.ChiSquares.png/512px-StatDist.ChiSquares.png" _fck_mw_filename="StatDist.ChiSquares.png" _fck_mw_width="512" alt="StatDist.ChiSquares.png" />

Need to define degrees of freedom here

### Characteristics of the χ^{2} Distribution

If *W*_{j} and *W*_{k} are independent <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{j}" src="/images/math/6/b/0/6b0005e43b70a25520c6a6abc4d0ea47.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variables, respectively, then *W*_{j} + *W*_{k} is <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{j+k}" src="/images/math/1/0/c/10cddb6bd869d6187a32e742d90a0891.png" />; this result can be extended to any number of independent chi-squared variables. This in turn implies the result the sum of the squares of *k* independent *N*(0,1) variables are also <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" />

## Derivation of the χ^{2} from Gamma functions

Gill discusses the χ^{2} distribution as a special case of the gamma PDF. That's fine, but there's actually a much more intuitive way of thinking about it, and one that comports more closely with how it is (most commonly) used in statistics. Formally, a variable *W* that is distributed as χ^{2} with *k* degrees of freedom has a density of:

<img _fckfakelement="true" _fck_mw_math="\begin{align} f(w) &=& \frac{1}{2^{k} \Gamma(k)} w^{k} \text{exp} \left[ \frac{-w}{2} \right] \\ &=& \frac{w^{\frac{k-2}{2}} \exp(\frac{-w}{2})}{2^{\frac{k}{2}} \Gamma(\frac{k}{2})} \end{align} " src="/images/math/5/1/5/5158ca285caaeda5490da80182369578.png" />

where <img _fckfakelement="true" _fck_mw_math="\Gamma(k) = \int_{0}^{\infty} t^{k - 1} \text{exp}(-t) \, dt" src="/images/math/c/9/d/c9d5de0db64c93347442795b22a9f129.png" /> is the gamma integral (see, e.g., Gill, p.\ 222). As with the normal distribution, the need to write the distribution in this fashion reflects the fact that it has no closed-form solution. The corresponding CDF is

<img _fckfakelement="true" _fck_mw_math=" F(w)=\frac{\gamma(k/2,w/2)}{\Gamma(k/2)} " src="/images/math/1/3/d/13d302967fd94bd45df3aa569e5503f2.png" />

where <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is as before and <img _fckfakelement="true" _fck_mw_math="\gamma(\cdot)" src="/images/math/e/4/8/e4812a607c060d3c6b4680f1b884021d.png" /> is the \texttt{http://en.wikipedia.org/wiki/Incomplete\_Gamma\_function}{lower incomplete gamma function}. We write this\footnote{One also occasionally sees *W*˜χ^{2}(*k*), with the degrees of freedom in parentheses.} as <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and say ``*W* is distributed as chi-squared with *k* degrees of freedom.* \\*

## Additional points needed on the chi-square

<a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)

- Probably want to mention the use in contingency tables here, since the connection isn't obvious.
- Agresti and Finlay state this was introduced by Pearson in 1900, apparently in the context of contingency tables---confirm this, any sort of story here?
- As df becomes very large, the chi-square approximates the normal; this is a asymptotic distribution and for practical purposes, can be used if df > 50
- Discuss more about the assumption of statistical independence?
- Chi-square as the test for comparing whether an observed frequency fits a known distribution

# Student's *t* Distribution

For a variable *X* which is distributed as *t* with *k* degrees of freedom, the PDF function is:

<img _fckfakelement="true" _fck_mw_math=" f(x) = \frac{\Gamma(\frac{k+1}{2})} {\sqrt{k\pi}\,\Gamma(\frac{k}{2})} \left(1+\frac{x^2}{k} \right)^{-(\frac{k+1}{2})}\! " src="/images/math/e/6/d/e6d0efa21a2ef9e5400f5d5cfdac879f.png" />

where once again <img _fckfakelement="true" _fck_mw_math="\Gamma(\cdot)" src="/images/math/1/1/e/11ee491fb6e261ad0b4f721d59ea7318.png" /> is the gamma integral. We write *X*˜*t*_{k}, and say ``*X* is distributed as Student's *t* with *k* degrees of freedom.* The figure below presents <i>t</i> densities for five different values of <i>k</i>, along with a standard normal density for comparison.*

<img src="/images/thumb/d/d8/StatDist.tDists.png/512px-StatDist.tDists.png" _fck_mw_filename="StatDist.tDists.png" _fck_mw_width="512" alt="StatDist.tDists.png" />

The t-distribution is sometimes known as "Student's t", after a then-anonymous ``student* of the statistician Karl Pearson. The story, from Wikipedia,*

The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name). Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians.

Note a few things about *t*:

- The mean/mode/median of a
*t*-distributed variate is zero, and its variance is <img _fckfakelement="true" _fck_mw_math="\frac{k}{k - 2}" src="/images/math/d/7/5/d75c6ef4ee2b360ba8f65eb687e33f1e.png" />. -
*t*looks like a standard normal distribution (symmetrical, bell-shaped) but has thicker ``tails*(read: higher probabilities of draws being relatively far from the mean/mode). However...* - ...as
*k*gets larger,*t*converges to a standard normal distribution; at or above*k*= 30 or so, the two are effectively indistinguishable.

The importance of the *t* distribution lies in its relationship to the normal and chi-square distributions. In particular, if *Z*˜*N*(0,1) and <img _fckfakelement="true" _fck_mw_math="W \sim \chi^{2}_{k}" src="/images/math/4/2/9/4299c7d4b1834264833030cc65295f32.png" />, and *Z* and *W* are independent, then

<img _fckfakelement="true" _fck_mw_math="\frac{Z}{\sqrt{W/k}} \sim t_{k} " src="/images/math/a/a/2/aa26d30be9d0c50902042558dcf5f532.png" />

That is, the ratio of an *N*(0,1) variable and a (properly transformed) chi-squared variable follows a *t* distribution, with d.f.\ equal to the number of d.f.\ of the chi-squared variable. Of course, this also means that <img _fckfakelement="true" _fck_mw_math="\frac{Z^{2}}{W/k} \sim t_{k}." src="/images/math/6/c/c/6cc32b16a52a7fcdaf5f98e77177b6b2.png" />

Since we know that <img _fckfakelement="true" _fck_mw_math="Z^{2} \sim \chi^{2}_{1}" src="/images/math/6/3/2/6328cdfbc5adb977368c7175e79b1484.png" />, this means that another derivation of the *t* distribution is as a ratio of a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{1}" src="/images/math/9/e/b/9eb85f77631ff93a56a6bd530579baac.png" /> variate and a <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{k}" src="/images/math/f/0/f/f0ff6604e7eae2605b1b118a3528ea32.png" /> variate.

## Additional points needed on the t distribution

<a href="User:Philip Schrodt">Philip Schrodt</a> 07:00, 13 July 2011 (PDT)

- May want to note that it is ubiquitous in the inference on regression coefficients
- Might want to note somewhere---this might go earlier in the discussion of df---that in most social science research (e.g. survey research and time-series cross-sections), the sample sizes are well above the point where the t is asymtotically normal. The t is actually important only in very small samples, though these can be found in situations such as small subsamples in survey research (are Hispanic ferret owners in Wyoming more likely to support the Tea Party?) and situations where the population itself is small (e.g. state membership in the EU, Latin America, or ECOWAS), and experiments with a small number of subjects or cases (this is commonly found in medical research, for example, and this also motivated Gossett's original development of the test, albeit with yeast and hops---we presume---rather than experimental subjects.). In these instances, using the conventional normal approximation to the t---in particular, the rule-of-thumb of looking for standard errors less than twice the size of the coefficient estimate to establish two-tailed 0.05 significance---will be misleading.

# The *F* Distribution

An *F* distribution is the ratio of two chi-squared variates. If *W*_{1} and *W*_{2} are independent and <img _fckfakelement="true" _fck_mw_math="\sim \chi^{2}_{k}" src="/images/math/d/f/b/dfb0914c2a449a81c51f1d89ea4ec283.png" /> and <img _fckfakelement="true" _fck_mw_math="\chi^{2}_{\ell}" src="/images/math/3/f/c/3fc30391893e01b4c49b1a8ec41b574c.png" />, respectively, then
<img _fckfakelement="true" _fck_mw_math="\frac{W_{1}}{W_{2}} \sim F_{k,\ell}
" src="/images/math/0/5/1/05164dba515bcc3a894d2ce3731268cc.png" />

That is, the ratio of two chi-squared variables is distributed as *F* with d.f.\ equal to the number of d.f.\ in the numerator and denominator variables, respectively.

Formally, if *X* is distributed as *F* with *k* and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom, then the PDF of *X* is:

<img _fckfakelement="true" _fck_mw_math=" f(x) = \frac{\left(\frac{k\,x}{k\,x + \ell}\right)^{k/2} \left(1-\frac{k\,x}{k\,x + \ell}\right)^{\ell/2}}{x\; \mathrm{B}(k/2, \ell/2)} " src="/images/math/7/6/8/76893d623336cf8f6974dd4a40735ec7.png" />

where <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(\cdot)" src="/images/math/c/2/d/c2d3433e3640c11e1f072c4006e17c11.png" /> is the ``beta function. That is, <img _fckfakelement="true" _fck_mw_math="\mathrm{B}(x,y) = \int_0^1t^{x-1}(1-t)^{y-1}\,dt" src="/images/math/0/6/8/0689adb68c7ec29099fc40c34aa0dad5.png" />.} We write <img _fckfakelement="true" _fck_mw_math="X \sim F_{k,\ell}" src="/images/math/a/2/c/a2c04e5e0527e6a1156e6bbc58d89c7f.png" />, and say ``*X* is distributed as *F* with *k* and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> degrees of freedom.* \\}*

The *F* is a two-parameter distribution, with degrees of freedom parameters (say *k* and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />), both of which are limited to the positive integers. An *F* variate *X* takes values only on the non-negative real line; it has expected value equal to <img _fckfakelement="true" _fck_mw_math="\text{E}(X) = \frac{\ell}{\ell - 2}," src="/images/math/8/d/a/8dab1cce0da2d33b88c188a1cab3c153.png" /> which implies that the mean of an *F*-distributed variable converges on 1.0 as <img _fckfakelement="true" _fck_mw_math="\ell \rightarrow \infty" src="/images/math/d/1/3/d132d0a78b8c0819b6187998c23cd1fb.png" />. Likewise, it has variance

<img _fckfakelement="true" _fck_mw_math="\text{Var}(X) = \frac{2\,\ell^2\,(k+\ell-2)}{k (\ell-2)^2 (\ell-4)}, " src="/images/math/3/5/1/35190b821a12585a7f4b779386f719ac.png" /> which bears no simple relationship to either *k* or <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" />.

The *F* distribution is (generally) positively skewed. Examples of some *F* densities with different values of *k* and <img _fckfakelement="true" _fck_mw_math="\ell" src="/images/math/3/3/4/334ce9eb79df1178b0380461c9eaa09e.png" /> are presented in the figure below.

<img src="/images/thumb/4/46/StatDist.FDists.png/512px-StatDist.FDists.png" _fck_mw_filename="StatDist.FDists.png" _fck_mw_width="512" alt="StatDist.FDists.png" />

If <img _fckfakelement="true" _fck_mw_math="X \sim F(k, \ell)" src="/images/math/3/0/a/30a268520cf8c7ecc505b7280344f1b2.png" />, then <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} \sim F(\ell, k)" src="/images/math/c/2/b/c2b8fcb83c04471d8606d0cb75b97fbe.png" /> (because <img _fckfakelement="true" _fck_mw_math="\frac{1}{X} = \frac{1}{(W_{1} / W_{2})} = \frac{W_{2}}{W_{1}}" src="/images/math/0/c/b/0cbf35f2c3c12afa441855bf91a2185d.png" />). In addition, the square of a *t* distributed variable is ˜*F*(1,*k*) (\textit{why}? -- take the formula for *t*, and square it...)

## Additional points needed on the F distribution

<a href="User:Philip Schrodt">Philip Schrodt</a> 10:00, 13 July 2011 (PDT)

- Discovered by Fisher in 1922, hence "F"
- Mention how it will be used for
*R*^{2}and ANOVA**Failed to parse (syntax error): F = MS_\frac{{between},MS_{within}}**

- Square of a
*t*_{k}statistic is an*F*_{1,k}statistic

# Summary: Relationships Among Continuous Distributions

The substantive importance of all these distributions will become apparent as we move on to sampling distributions and statistical inference. In the meantime, it is useful to consider the relationship between the four distributions we discussion above

<img src="/images/thumb/2/2e/Continuous.dists.png/512px-Continuous.dists.png" _fck_mw_filename="Continuous.dists.png" _fck_mw_width="512" alt="Continuous.dists.png" />

# References

# Discussion questions

# Problems

=Glossary=

- [[Def: ]]
- [[Def: ]]
- [[Def: ]]

__FORCETOC__