Univariate data visualization
Revision as of 06:52, 8 July 2011 by Emily Clough
- 1 Objectives
- 2 Introduction
- 3 Looking at a single variable: pie charts and bar charts
- 3.1 Pie charts
- 3.2 Bar chart
- 3.3 Pie charts v. bar charts: the relative advantages of each
- 4 Designing graphs: maximising information transition
- 5 Conclusion
- 6 References
- 7 Discussion questions
- 8 Problems
Graphs are one of the clearest ways to convey information about a single variable to an audience, particularly an audience that is unfamiliar or uncomfortable with numbers. While many people will claim that they do not understand statistics, they can look at a pie chart and get some sense of what a variable looks like. This is why we so frequently see graphs in newspapers and on television; they are an easy and effective way to convey information. It is important, however, to think carefully about how you construct graphs. In particular, you must consider which kind of graph will best convey the information you are interested in.
Looking at a single variable: pie charts and bar charts
Pie charts and bar charts are two of the most commonly used types of graphs used to convey information about a single variable; for this reason, they are called univariate graphs. Pie charts are the type we see most commonly, and one that many people are familiar with. However, each type offers different advantages in terms of how the information is presented. You should think carefully about these advantages before deciding which type to use.
Pie charts are quite common, and you have probably run across them in news reports of various kinds. A pie chart looks at a single variable. That variable is divided into categories, and the proportion of the total cases in each category is calculated. A pie chart is simply a circle, divided into sections that reflect the proportions of each category.
2010 British Parliamentary Elections: a pie chart
The first example we'll look at is the Wikipedia:United_Kingdom_general_election,_2010.
While you have probably seen a pie chart before, you maybe less familiar with the bar chart. bar charts are less commonly seen in the media, but are very useful univariate graphs. A bar chart is a type of bar chart that presents the proportions of a variable, much like a pie chart. In a bar chart, the values of the variable are laid out along the horizontal axis of the graph. Each value of the variable has a bar that reflects the number of times that value occurred. For instance, in the example below the names of the political parties running in the Wikipedia:United_Kingdom_general_election,_2010 are arrayed along the horizontal axis. Each bar above the horizontal axis reflects the number of votes that party received in the 2010 election.
2010 British Parliamentary Elections: a bar chart
For this example, we are looking at exactly the same data that was used to create the pie chart above.
Pie charts v. bar charts: the relative advantages of each
Pie charts are used widely in the media, and so students often turn to them when they need to represent information from a single variable. However, bar charts offer some advantages over pie charts and it is important to understand when each type of chart might be useful.
Type of variable
If you are looking at a Def:Nominal_variable, there might not be any relative advantage to using a pie chart or a bar chart. The example used above, the political party voted for during the Wikipedia:United_Kingdom_general_election,_2010, is a nominal variable. Because the categories have no inherent order to them, it isn't important where the categories are placed relative to one another on either the pie chart or the bar chart.
An Def:Ordinal_variable contains more information than a Def:Nominal_variable: it also contains information about what order the categories belong in. If an Def:Ordinal_variable is presented in a pie chart, it is impossible to show the order of the categories. However, if an Def:Ordinal_variable is presented in a bar chart, you can arrange the categories in order along the horizontal axis; this allows you to convey additional information about the variable.
If you are going to use a Def:Ratio_variable with either a pie chart or a bar chart, you will probably need to divide the variable into discrete categories. For instance, if you wanted to use a pie chart or bar chart to display information about age, the proportion of people of any one age will be quite small; if you present age in a pie chart you would end up with many tiny slivers in your circle. If you want to present a ratio variable in a pie or bar chart, you must first break it down into categories; for instance, if you wanted to use age in a pie or bar chart, you might break age down into categories of 18-25, 26-35, 36-45, etc. Then you could treat it like an Def:Ordinal_variable , as discussed above.
Example: Pie chart v. bar chart for a Wikipedia:Likert scale
One common example of a Def:Ordinal_variable is a Wikipedia:Likert scale. These are quite often used in political science to measure opinions about an issue. In this example, we use data from the Pew Center's November 2010 Post Election Survey. <ref> (November 1, 2010 ). ""November 2010-Post Election". . . http://www.pewinternet.org/Shared-Content/Data-Sets/2010/November-2010--Post-Election.aspx . Retrieved July 7, 2011 . </ref> In this survey, they ask: "From what you know, do you strongly agree, agree, disagree or strongly disagree with the Tea Party movement, or don’t you have an opinion either way?"; possible responses are: strongly agree, agree, no opinion either way, disagree, strongly disagree" We have created both a pie chart and a bar chart based on the data in response to this question.
Both charts present the same information. However, in the bar chart, it is very easy to see that responses to the question are clustered towards the centre; in other words, most people don't care either way about the Tea Party. While you can certainly see this information in the pie chart, it is not conveyed as clearly as in the bar chart. This is one demonstration of why it might be better to use a bar chart when you are working with an Def:Ordinal_variable.
Comparing categories v. proportion of the whole
You should think carefully about what you would like the audience to learn from looking at your graph. Sometimes you want people to be able to easily compare categories to one another; for instance, if you are interested in distribution of votes in a plurality election, the important thing to know is who received the most votes. In this case, it is better to use a bar chart. In order to compare categories in a bar chart, all you need to do is compare the height of the bars. In order to compare categories in a pie chart, you need to compare the area of each pie wedge. It is more difficult to determine the relative area than it is to determine relative height.
Other times, it is more important to know what proportion of the whole the category is. For instance, if you were interested in the distribution of vote in a majoritarian democracy, what will matter is whether any party received more than 50% of the vote. If you are interested in demonstrating the proportion of the whole that belongs in a certain category, a pie chart is more appropriate. A quick look at the pie chart will allow us to see if any one wedge takes up more than 50% of the whole circle. In a bar chart, the proportion of the category with relation to the whole is not clear.
Designing graphs: maximising information transition
Spreadsheet programs such as Excel make it very easy to add "exciting" effects to your graph. However, whenever you add an element or effect to a graph, you should think carefully about whether that element is helping you to convey information, or whether it is actually making information more difficult to interpret. Edward Tufte <ref>[[ |Tufte , Edward ]] (1983 ). The Visual Display of Quantitative Information . . Graphics Press . doi: . ISBN . </ref> <ref> ( ). "The Work of Edward Tufte and Graphics Press ". . . http://www.edwardtufte.com/tufte/index . Retrieved . </ref> claims that many of the modifications people make to graphs actually obscure the information being presented.
One of the most common modifications made to graphs is to add a three-dimensional effect. Below is the bar chart showing the distribution of support for the Tea Party, plus the same bar chart with a three dimensional effect added. These graphs use exactly the same data, but they appear quite different. For instance, the regular bar chart shows quite clearly that more than 200 people in the sample strongly disagree with the Tea Party; however, the three-dimensional graph makes it appear that there are fewer than 200 people in the sample who strongly disagree with this statement. The same is true of the "Agree" category: while the regular bar chart makes it clear that there are about 400 people in the sample who agree with the Tea Party, the three-dimensional effect graph makes it appear that there are fewer than 400. This example illustrates why it is so important to consider carefully what kinds of effects should be added to your graph.