# Objectives

• Learn the difference between population parameter and sample statistics.
• Understand how

# Introduction

In this section, we will introduce the notion of a sampling distribution. The sampling distribution is a theoretical idea in statistics that helps us understand many important problems, such as hypothesis testing and confidence intervals.

# Population parameters and sample statistics

Let's say we have a number we're interested in calculating for our research problem. For instance, we might be interested in the percentage of voters who will vote for Romney in an upcoming election. The Def:population parameter is the number we are interested in as it exists in the population. In the case of support for Romney, the population parameter is the true percentage of the whole population that will vote for Romney in the upcoming election. It is the population parameter we would really like to be able to calculate. However, the only way to find the true population parameter is if we have data for the whole population; for instance, many countries do a national census every ten years in which they talk to every single person living in the country. In this case, you could claim to have a true population parameter because you are accessing the whole population.

Usually we don't have the resources to access the entire population, which is why we do a Def:sample. A sample statistic is the number that we calculate from the sample. If we are interested in the percentage of people who plan to vote for Romney, the sample statistic is the percentage of our sample that would vote for Romney.

For any one population parameter, there are many possible sample statistics. That's because you could take more than one sample from a population, and each of these might have a slightly different statistic. For instance, if you took two different samples and asked the people in each sample whether they planned to vote for Romney in the election, the percentage of people who plan to vote for Romney in each sample might be different.

# Sampling distribution

We can never be sure that the sample statistic we have calculated is the same as the population parameter, which is the number we are truly interested in. If we can never be sure that the sample statistic is the same as the population parameter, why would we calculate the sample statistic in the first place? If we have sampled correctly using a probability sample, we should be able to figure out how close our statistic is to the population parameter. To do this we will need to use a tool called the Def:sampling distribution.

The sampling distribution is an hypothetical concept. I will describe how a sampling distribution is constructed in the following paragraphs, but it is important to remember that you do not construct the sampling distribution when you are doing a research project. You can, however, learn certain things about the sampling distribution without constructing it. The things you learn about the sampling distribution will help you figure out how close the sample statistic is to the population parameter.

The sampling distribution is a type of frequency distribution. The sampling distribution gives you the distribution of sample statistics over all possible samples of a population.

## Example: Average age of residents of Maine

In real life, we never construct a sampling distribution. However, this example will walk you through how to construct a sampling distribution so that you can better understand what they are.

Let's say you are interested in the average age of residents of the state of Maine. You cannot ask everyone in the state their age, so you ask a sample of 1000 individuals what age they are. You then take the mean of these numbers. This is your sample statistic of average age of residents of the state of Maine; let's say the average age in the sample is 64. The beginning of the sampling distribution would look like this:

In this distribution, there's only one sample with an average age of 64.

Let's say we do a second sample of 1000 people, calculate the average and add it to the graph. In this case the average age is 66. The graph now shows that one sample has an average age of 64 and that one sample has an average age of 66:

Let's say we take another six samples and calculate the average age of each of these. The graph looks like this:

If we were to continue to do this until we had taken all possible samples and graphed the average age of each one, this would be the sampling distribution of the average age.