# Objectives

• Understand how two variables can be associated.
• Introduce crosstabulations.
• Learn how to make crosstabulations.
• Learn how to best present a crosstabulation.

# Introduction

Although it is important to quantify and describe individual variables, most scholarship is concerned with how two concepts or variables are related. For example, researchers might ask:

• Do people who get most of their news from television trust other people less than people who get their news primarily from newspapers or the internet?
• Are Catholics more likely to be pro-life than Protestants?
• Do democracies suffer less from corruption than non-democracies?
• Are 18-25 year olds less interested in politics than older adults?
• Are individuals who view the economy favorably more likely to vote for the incumbent government?

To answer questions like these, we will discuss how to make tables to examine how to test hypotheses that two categorical variables are related to each other. These bivariate tables are interchangeably called contingency tables or cross-tabulations (cross-tabs for short). Cross-tabulate can also be used as a verb to describe the process of completing a bivariate analysis.

When two variables are related, differences or changes in value in one variable coincide with differences or changes in the value of a second variable. By combining the frequency tables of each of the variables into a cross-tabulation, scholars look at the pattern of results to see if the variables are associated. Two variables are said to be associated if one can accurately guess the value of one variable if one knows the value of the second variable. Measures of association provide an indication of how well two variables are associated with each other.

This section will focus on comparing two categorical (nominal or ordinal) variables in a cross-tabulation . After describing how to make and interpret a cross-tabulation, we will discuss how to visually assess whether the variables are associated and provide some suggestions for making effective tables. We will also discuss the impact of recoding variables. In the following section, we will look at how scholars test to see if the relationship is not due to random chance and statistical measures of association.

The relationship between two continuous variables is analyzed in a correlation analysis, which is covered in a later section.

# Cross-tabulation

A cross-tabulation takes the frequency table for one variable, and combines it with a frequency table for a second variable. Cross-tabulations can be accurately described as a bivariate frequency distribution. One variable's value categories make up the rows (typically the explained or dependent variable) of the new two-dimensional table, while the second variable's value categories make up the columns (typically the explanatory or independent variable). Here is an example:

### Example 1: Cross-tabulation of gun ownership and support for gun permits (2010)

Favor or Oppose Gun Permits Have Gun in Home Total
Yes No Refused
Favor 242

61.7%

683

81.9%

18

46.2%

943

74.6%

Oppose 150

38.7%

151

18.1%

21

53.9%

322

25.5%

Total 392

100.0%

834

100.0%

39

100.0%

1,265

100.0%

The two variables that were cross-tabulated above are:

• GUNLAW: Would you favor or oppose a law which would require a person to obtain a police permit before he or she could carry a gun?
• OWNGUN: Do you happen to have in your home any guns or revolvers?

These were taken from the dataset 2010 U.S. General Social Survey, a poll of Americans [1]. The value categories for favor/oppose a police permit to buy a gun make up the rows, and the value categories of whether or not you own any guns make up the columns.

## Components of a contingency table

The row and column labels are called the category labels or value labels. In the top left corner is the variable label for the row variable. Above the column category labels is the variable label for the column variable.

Unless the table says otherwise, the boxes in the middle of the table contain frequencies (the number of observations or cases) that fall into this box. Each of these boxes are called data cells, or cells for short.

Look at the cell in the top left corner of the example above, "Gun Ownership and support for gun permits." There are 242 frequencies in this cell. This indicates that 242 respondents indicated that they own a gun and answered that they support requiring people to get a permit from the police before carrying a gun.

In the same cell, we also find the percentage of respondents who said they they support requiring people to get a permit from the police before carrying a gun (61.7%). Percentages in contingency tables should not exceed one digit past the decimal point. The percentages displayed in a contingency table could be row percentages that display the percentage of observations in the cell relative to the total number of observations in the row, column percentages that display the percentage of observations in the cell relative to the total number of observations in the column, and/or total percentages which display the percentage of observations in the cell relative to the total number of observations in the entire table. This table uses column percentages, which is the most common percentage displayed in contingency tables.

You can find the total number of observations in each row at the end of the row and the total number of observations in each column at the bottom of the column. These numbers are called marginal totals. For example, the total number of respondents, 392, who indicate that they own a gun can be found in the cell in the far left of the bottom row.

You may also see percentages in these cells, which are called marginal percentages. Marginal percentages at the end of top row in the table below indicate the percentage of responses who indicated that they support requiring people to get a permit from the police before carrying a gun. 943 respondents indicated that they support requiring people to get a permit from the police before carrying a gun, 74.6% of the 1,265 total respondents. Because this example uses column percentages, the marginal percentages at the bottom of each column are all 100%, which clearly conveys to the reader that the percentages are column percentages and sum to 100% in each column (it is not unusual nor a problem, if due to rounding, the marginal percentages sum to 101% or 99%).

The grand total of observations (1,265) can be found in the bottom-right corner. Since this example uses survey data, this is the total number of respondents whom the survey polled (including 39 people who refused to answer the question about whether or not they own any guns). Borders, colors, and underlined or highlighted text, are used at the table creator’s discretion for clarity or emphasis.

# References

<references group=""></references>

# Problems

1. Based on the data below:
• What percentage of voters supported the incumbent party?
• How many voters who viewed the economy as having worsened voted for the incumbent party?
• Based on the data in the table below, does there appear to be a relationship between retrospective economic evaluations and support for the incumbent party?

### Cross-tabulation of retrospective economic evaluations and support for the incumbent government (Canada 2011)

Incumbent vote
Retrospective evaluation of national economy
Total
Worse Same Better
Did not vote for incumbent 384

73.7%

791

67.6%

454

45.1%

1,629

60.4%

Voted for incumbent
137

26.3%

379

32.4%

552

54.9%

552

54.9%

Total 521

100.0%

1,170

100.0%

1,006

100.0%

2.697

100.0%

The two variables that were cross-tabulated above are:

• CPS11_39 (recoded to remove missing cases): Over the past year Canada's economy has become better, become worse, or stayed about the same?
• PES11_6 (recoded to remove missing cases): Which party did you vote for?

These were taken from the 2011 Canadian election study dataset, available at: www.ces-eec.org

• [[Def: ]]
• [[Def: ]]
• [[Def: ]]