• Understand how scholars operationalize concepts into observable variables they can use to test hypotheses.
  • Learn the differences between different levels of quantitative measures.



Who is the most powerful governmental leader in the world: the President of the United States, the President of France, or perhaps the King of Saudi Arabia? In which society do people enjoy the most liberty: the society where few crimes are committed and people can live carefree lives albeit under frequent state surveillance, or the society where the state provides minimal interference in people's lives but where people have much to fear from their fellow citizens?

The answer to the above questions about power and liberty probably depends on how one defines the key terms. Before scientists can study anything, they must define and describe that thing. The process of defining and specifically describing how to measure something is called operationalization. When scholars operationalize the variables they are using in their research, they determine precisely how to measure each variable.

The process of operationalization starts with some hazy idea of what a scholar would like to study and ends with a clear, easily communicated, consistent representation of that idea. Sometimes, this idea is readily observable or quite concrete, like the amount of legislation passed by the U.S. Congress last year. Other times, the idea is abstract and there is uncertainty and even confusion over how to define and describe the idea. Take for example, the question at start of this section, "who is the most powerful governmental leader in the world?" To ascertain who is the most powerful, scholars must first make clear what they mean by "power." Power may be defined in terms of the capability of a leader to realize his or her policy goals without being checked by other people in their governments. Using such a definition, one might rank the President of France as more powerful than the President of the United States because as long as he controls a majority in the parliament, there are relatively few checks on his capacity to make policy decisions since the French President's term in office is longer, the French Parliament lacks much of the ability of the U.S. Congress to influence legislation and other reasons. An absolute, hereditary monarch like the King of Saudi Arabia may be seen as even more powerful because he serves for life without any formal restrictions on his rule. Alternatively, the economic and military strength of the leaders' country may factor into how we define power, inflating our estimation of the President of the United State. As you can see, the question about who is the most powerful governmental leader must be preceded by a clear operationalization of "power."


The first stage of the operationalization process is often called conceptualization because scholars must first communicate the hazy idea of what they want to study into a specific "concepts." Concepts are the word(s), symbol(s) or phrase(s) used to provide meaning so that others understand the idea. Conceptualization is common in many philosophical works as authors attempt to clarify the meaning of ideas. For example, lets return to the question of what does "liberty" mean. Dissatisfied with the way liberty was construed in the Victorian Era, J.S. Mill attempted to conceptualize liberty in the late 1850s. In his essay, "On Liberty" [1], Mill described how the concept of liberty evolved over time, before defining individual liberty as consisting of three distinct components: the liberty of thought, the liberty of tastes, and the freedom to plan our lives. These components of a broader concept are often called dimensions.

Once scholars have conceptualized their ideas, they can identify observable indicators of those concepts. Depending on how abstract the concept is, the indicator may be a direct observation or an indirect reflection of the concept. Since both "liberty" and "power" are abstract terms, scholars must observe indirect reflections. For example, by operationalizing Mills' dimensions of liberty, we could to ascertain whether citizens enjoy liberty of thought by looking at whether a country allows citizens the freedom of speech or whether the state carefully regulates what is published or spoken in public. Using this indicator, countries like Iran and China would rate low on a measure of liberty relative to North American and European democracies.

Indicators often require some simplification of the underlying concept or loss of meaning. If we relied solely on some measure of freedom of speech we would clearly not be capturing the full meaning of liberty. As a result, in the social sciences, scholars frequently employ multiple indicators to best capture the multidimensional aspects of each concept or assuage concerns that one indicator does not provide a very accurate representation of the underlying concept. The set of indicators used to describe a concept form the operational definition of the concept.

Attributes and Instrumentation[edit]

We tend comprehend all objects, including conceptual indicators, in terms of their attributes. Attributes are what we see when we make observations about an object. One simple example are eyes. We often describe eyes using their color. The attributes of eye color include blue, brown and hazel. We can describe conceptual indicators using similar terms. One attribute of Iran's political system is tough restrictions on press and religious freedoms. Based on these attributes, observers conclude that people in Iran have low levels of liberty relative to a country like the United States.

The last step in operationalization is instrumentation, which is the process of constructing a tool (or instrument) to assign consistent values to particular attributes. In this process, identical or similar attributes are categorized together. These categories should be both exhaustive and mutually exclusive. In other words, every attribute can be assigned to one, and only one category. These categories then receive a label called a "value", using the language of quantitative researchers even though such values are not necessarily numerical. Some scholars will differentiate between categorical values that are assigned by the researcher, like a rating given to Iran based on the level of liberty enjoyed by its citizens, and quantitative values that have an inherent mathematical value like the number of votes received by a candidate for electoral office.

Instrumentation is especially important to quantitative researchers, whose analyses require precise, standardized units of comparison. Common social scientific instruments include opinion surveys, census data and event counts. These instruments provide numerical values for the indicator attributes scholars observe. Variables, the subject matter of every social scientific analysis, have at least two different values.

Example: Civil liberties around the world[edit]

One frequently reported measure of the liberties enjoyed by people in different countries around the world is produced by a NGO called Freedom House ([2]). Freedom House publishes an annual report called Freedom in the World. In the report, a team of analysts rates the rights and freedoms enjoyed by citizens in 194 different countries and 14 territories. Freedoms are conceptualized using standards derived from the Universal Declaration of Human Rights. Civil liberties are operationalized using fifteen questions about each country or territory that tap into four dimensions: 1) freedom of expression and belief, 2) associational and organization rights, 3) personal autonomy and individual rights, and 4) the rule of law. At one extreme, citizens in countries and territories that enjoy freedom of expression, assembly, association, education and religion, strive for equality of opportunity, and can freely engage in a range of economic activities, protected by a fair system of laws including an independent judiciary. These free countries include Canada and the United States, and were given a value of one in 2010. At the other extreme, Freedom House analysts gave countries with few or no civil liberties a score of seven. In 2010, Burma (Myanmar) and Tibet received sevens. Two countries mentioned above for poor civil liberties, Iran and China, both received scores of six.

Levels of Measurement[edit]

Some values are capable of describing attributes mathematically more precisely than other values. Measuring distance in millimeters is more precise then han measuring in centimeters (excluding the use of decimals). Similarly, measuring total annual household income in thousands of dollars is more precise than breaking households into categories of "low income," middle income," or "high income." Measuring race as "American Indian or Alaska Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian or Other Pacific Islander, or White" is more precise than "African American, Hispanic or Latino, White, or Other."

Statisticians classify four different levels of measurement based on their degree of precision. The four levels of measurement in ascending order of precision are nominal, ordinal, interval, and ratio. Because interval and ratio variables are statistically indistinguishable for many standard statistical tests, we discuss those two levels together. This is common practice in basic statistical texts, who will call such measures "interval-ratio," or "continuous," because both can describe an infinite number of values. Because the more precise the level of measurement, the wider the range of statistical tools that can be used on the data, a general rule is to use the most precise measure that is feasible.


The nominal level of measurement is when the values assigned to variables only represent different categories or classifications for that variable. Examples include many personal characteristics like race, ethnicity, religion or sex, geographic places like cities or countries, political parties, presidential approval (e. g. [3]), or people's opinions about what are the most important problems facing their country. These categories have no order. A higher number simply reflects the arbitrary choice of the researcher who designed the coding scheme. Dichotomous or binary variables with only two values are usually treated as nominal. Like other levels of measurement, categories should be exhaustive and mutually exclusive, even if this necessitates having a category called "other."

Examples of Nominal Level Measures: Political Party (Canada)[edit]

It is common to ask survey respondents who they intend to vote for in the next election or for political scientists to record the names of parties who won seats in Parliament. The list of candidates or political parties are nominal level variables. For example, in Canada there are five federal political parties with seats in Parliament: the Conservatives, the Liberals, the NDP, the Greens and the Bloc Quebecois.

Remember, the order is arbitrary, so we could assign values to the parties in alphabetical order:

Bloc Quebecois = 1
Conservatives = 2
Greens = 3
Liberals = 4
NDP = 5

Since the order is arbitrary, we could also assign values to the parties based on the number of seats they hold in 2011:

Conservatives = 1
NDP = 2
Liberals = 3
Bloc Quebecois = 4
Greens = 5

It makes no difference which order we present these categories since the numerical values simply represent different categories. It does not even matter which value we assign to each category, so Conservatives could be 10, and the NDP could be 13 and the Liberals 17. The limited amount of information about the categories represented by each value is why nominal measures are the lowest level of measurement precision.

With regard to measures of central tendency [LINK TO MEASURES OF CENTRAL TENDENCY], only the mode {LINK TO MODE], the category which occurs most often, may be used. Median [LINK TO MEDIAN], the point which divides the data into the upper half and lower half, and mean [LINK TO MEAN], or mathematical average, cannot be calculated.

Examples of Nominal Level Measures: Dichotomous Variables[edit]

Scholars typically treat dichotomous (or binary) variables that have only two categories as nominal. For example, sex would be:

Female = 1
Male = 2

Since the order and the values assigned to the categories are arbitrary, remember that male could also be listed first:

Male = 1
Female = 2

Since the values assigned to each category are also an arbitrary decision by the researcher, sex could also be categorized as:

Male = 0
Female = 1


The categories of an ordinal variable order or rank observation values from low to high. Ordinal level measurements allow the researcher to determine which observations have more or less values of a variable. However, ordinal variables cannot communicate how much more (or less) one observation has relative to another observation and the relative distance between each category. The distance between each category may not be specified or consistent. Many ordinal level measures are discrete, with limits as to how high (and/or low) values can take. Like nominal-level measures, ordinal categories should be exhaustive and mutually exclusive. What makes ordinal variables different than nominal variables is that because the categories are ordered, each category can be said to have more or less of the thing or concept measured by the variable.

Examples of Ordinal Level Measures: Education[edit]

Completed 8th grade or less = 1
Completed some high school = 2
High school graduate = 3
Completed some college or university = 4
College or university graduate = 5

Coding education as ordinal allows the researcher to state that an individual whose educational level is "5" (college or university graduate) has more education than someone whose educational level is "2" (completed some high school), but not precisely how much more. That is, the first individual who has graduated university cannot be said to have "three more units" of education than the second individual who has only completed some high school.

Examples of Ordinal Level Measures: Satisfaction Scales[edit]

Other examples of ordinal level measurement include satisfaction scales. For example, in June 2004, Princeton Survey Research Associates International conducted a poll for the Kaiser Foundation which asked Americans about their level of satisfaction with the availability and affordability of health care in America (among other issues). The original question (available through the Roper Center [4]) was:

Next I'd like you to rate your satisfaction with the state of the nation in some different areas. For each of the following, please tell me whether you are very satisfied, somewhat satisfied, not too satisfied, or not at all satisfied. (First,) how about the availability and affordability of health care. Are you very satisfied with this area, somewhat satisfied, not too satisfied, or not at all satisfied?

Very satisfied = 1
Somewhat satisfied = 2
Not too satisfied = 3
Not at all satisfied = 4

We know that someone who chooses "not at all satisfied (4)" is less satisfied than someone who chooses "somewhat satisfied (2)," but we cannot confidently say that the first individual is "twice as satisfied" based on the values assigned to each category. In fact, as long as the order is preserved, the actual values do not matter, so we could reverse the scale and start the scale at zero instead of one, like this:

Not at all satisfied = 0
Not too satisfied = 1
Somewhat satisfied = 2
Very satisfied = 3

Ordinal measures are more precise than nominal measures, so while we can continue to report the mode {LINK TO MODE](the category which occurs most often), we can also report other measures of central tendency [LINK TO MEASURES OF CENTRAL TENDENCY]. Median [LINK TO MEDIAN] (the "middle" observation which divides the data into the upper half and lower half) is most commonly and appropriately used, but when there are a large number of categories, scholars may also calculate the the mean [LINK TO MEAN].

A cautionary examples of ordinal level measures that are actually nominal[edit]

There are often times when variables that look to be ordinal actually are not. It just takes one unordered category to render the variable nominal. Above, we presented an example of education as a variable measured at the ordinal level. On some surveys, education is not ordinal. For example, the 2008 Canadian Election Survey, asked survey respondents, "what is the highest level of education that you have completed?" Respondents could choose from the following answers:

1. No schooling
2. Some elementary school
3. Completed elementary school
4. Some secondary / high school
5. Completed secondary / high school
6. Some technical, CAAT, CEGEP,...
7. Completed technical, CAAT, CEGEP,...
8. Some university
9. Bachelor's degree
10. Master's degree
11. Professional degree or doctorate

Notice that while the variable appears to be ordinal, with "no schooling" appearing before "some elementary school," "completed elementary school" and so on, the order breaks down after secondary / high school education. Category # 6 is "some technical...", #7 is "completed technical...", and #8 is "some university." "Some university" could be just a single semester fo classes, arguably less than a full year or two of technical school. It is not clear that "some university" is a higher level of educational attainment than "completed technical." Similar, #10 is a master's degree, which is less than a doctorate #11, but may not be any less of an educational accomplishment than a professional degree (also #11) like a law or dental degree. As a result of these two categories, this measure must be considered to be nominal. However, using a statistics program, one could transform it into an ordinal level variable by collapsing the post-secondary levels of educational attainment short of a bachelor's degree (#6, 7 & 8) and combining the category for master's degree (#10) with the category for professional degree or doctorate (#11). The new ordinal variable would have eight categories and look like this:

1. No schooling
2. Some elementary school
3. Completed elementary school
4. Some secondary / high school
5. Completed secondary / high school
6. Some post-secondary education (university or technical)
7. Bachelor's degree
8. Post graduate degree (master's, doctorate or professional).

This is an important transformation because if the variable is nominal, one could not report the median level of educational achievement.

Interval and Ratio[edit]

Interval level measures allow the researcher to determine whether one category contains more or less of a variable's attributes than another category and how much more or less. The intervals between categories or values are uniform in size. What this means is that the difference between 60 degrees Fahrenheit and 67 degrees Fahrenheit is the same as the difference between 72 degrees Fahrenheit and 79 degrees Fahrenheit. This is true even if no observation in the data set is assigned a value within that space. Ratio measures are similar except they have a meaningful zero that indicates a complete absence of the attribute being measured. With such measures, the proportions between values are uniform (hence, the name "ratio variable"). Age is measured in years, a ratio variable, as are event counts like the number of times a country has fought wars. There cannot be observations that have fewer than zero units of that variable.

Mean [LINK TO MEAN] is the most appropriate measure of central tendency for interval and ratio data unless the data is skewed [LINK TO SKEW]. All of the other measures of central tendency can be calculated.

Examples of Interval Level Measures[edit]

Temperature coded as exact temperature

Temperatures from Cities Around the World
City Temperature
Austin, TX, USA 99°F
Budapest, Hungary 89°F
Longyearbyen, Svalbard, Norway 43°F
Quito, Ecudor 68°F

Although there are no observations between 43° and 68°, we can say that the temperature in Quito is exactly 25° warmer than the temperature in Longyearbyen.  However, the temperature is Budapest cannot be said to be slightly more than "twice" as warm as the temperature in Longyearbyen because the Fahrenheit temperature scale has no meaningful zero (Kelvin and infrequently used Rankine scales are only temperature scales with a meaningful zero).

Examples of Ratio Level Measures[edit]

Number of troops deployed

Number of years served as a Supreme Court Justice

60,000 troops is not only 20,000 more troops than a 40,000 troop deployment, but it can be said to be 50% larger. It can also be said to be twice as many troops as a deployment of 30,000. Similarly, 10 years served on the Supreme Court is 5 years more than a tenure of only five years and 100% larger.

Create section listing and linking to appropriate statistical tests for different levels of measurement[edit]

Nature of Measurement[edit]

We don't always have a choice about which level of measurement to use. If we did, we would always use the most precise measure available. However, data from the real world don't always fit our statistical needs. Sex, Religion, Party Affiliation, and Race or Ethnicity will always be Nominal. Strength of partisanship and degree of satisfaction will nearly always be Ordinal. However, there are also times when an Interval measure could be used, such as for household income, but because to the nature of surveys, we usually measure this variable in categories when using this data gathering technique.

Measures may also be described as Discrete or Continuous.

Discrete Measures[edit]

Discrete measures have naturally separate points that cannot be further subdivided. Nominal level measures and ordinal level measures are also naturally discrete in nature. However, many interval or ratio level measures may also be discrete. No matter how many runners are left on base, you cannot have 4.25 runs in a softball game. Even if you did not vote in every race on the ballot, you either did or did not participate in the election. A person cannot be said to have participated in 12.3 elections.

Continuous Measures[edit]

Continuous measures consist of gradations that, in principle, can be infinitely subdivided. Income could be divided into thousands, hundreds, tens, ones, tenths, hundredths, thousandths, etc. Age can be divided into years, months, weeks, days, hours, minutes, seconds, tenths of seconds, etc.


Accuracy concerns the question of how we measure our variables and how these measures relate to our theoretical concepts. The variables we choose to measure and how we choose to measure them must match the parameters of the theory we are seeking to test. A chief problem of accuracy in measurement is ensuring that the relationship between your theoretical concepts and the measurement of your variables is such that the relationship between your measured variables truly reflects the relationship between your theoretical concepts. In order to be accurate, a measure must be reliable and valid.

Example: Social Status[edit]

Social Status is a concept often applied in the social and behavioral sciences. Unfortunately, this can often be a subjective concept. Depending on the theory you are testing, your measure of an individual's Social Status may include elements of such varying measures as income, occupational prestige, level of education, community standing, community involvement, etc. Which of these specific elements you chose to measure depends on the theory you are testing. Further, even when using a measure such as income, failure to account for cross-regional differences in cost of living could result in different classifications for individuals who hold relatively similar economic positions in their relative communities.

Example: Presidential Legislative Effectiveness[edit]

A student might want to evaluate the factors that contribute to the U.S. president's legislative effectiveness. One might measure legislative effectiveness by the number of the president's proposals which were passed into law by Congress. Among the factors that one might hypothesize would impact a president's legislative effectiveness is the influence of that president. Certainly, public approval ratings contribute to a president's influence, but what else? Richard Neustadt has noted that a president's power is determined by his "power to persuade." Deciding how you intend to measure a president's "persuasiveness" would greatly impact the validity of the test of your theory.


A measure is reliable when repeated applications yield the same results; that is, a reliable measure will produce the same answer each time it is applied to a particular object.

Example: Reliable and Unreliable Measures[edit]

If one desired to measure the size of one's methodology classroom, one might choose to use either a tape measure or one's own "strides" or "paces." Because the lengths of the units of measurement on the tape measure have been standardized, you should get the same result each time you use the tape measure to measure the room. However, because the length of one's stride varies not only from person to person but also slightly with any given stride from the same person, you are unlikely to achieve perfectly identical measures by repeatedly pacing off the distance from one side of the room to the other. Therefore, measuring the room in feet and inches using a tape measure will be more reliable than measuring the room in "paces."

Tests for Reliability[edit]

Test-retest is applying the same measure a second time to see whether the observer obtains the same results. Note: before drawing conclusions based on a test-retest method, the observer must consider whether any observed differences are due to the quality of the measure or the consistency of the application of that measure.

Internal consistency (sometimes called a "split-half check") is usually examined when multiple measures are used to measure the same concept. For example, multiple, differently-worded items in a survey might seek to measure latent racism or multiple knowledge and interest questions may be combined create a political knowledge scale. These items taken separately should reflect the assessments as determined collectively.


A measure is valid when it measures what it claims to measure, or is supposed to measure. To the degree the measure truly mirrors the the concept drawn from one's theory, then that measure will be valid for testing that theory.

NOTE: if a measure is valid then it will also be reliable. However, a measure can be reliable without being valid.
For Example: if you use a properly marked quart container to measure how many liters of water are contained in a vessel, then you will get the same answer each time; therefore, the measure will be reliable. However, since the container is marked for quarts and not liters, you measure will not be valid.

Random versus Non-random Error[edit]

One way to understand the difference between reliability and validity is to examine the difference between random and non-random error.

Random error occurs primarily from mistakes. These may be coding errors or errors from respondents that are not systematic, but rather stochastic (random and non-deterministic). There is no pattern to random error and an error on the "high side" of a measure is just as likely as an error on the "low side" of a measure. With a large enough sample size, these errors tend to average out and will not fatally skew your results -- with a large enough sample size.
Non-random error is often systemic, in that it permeates the entire system, and systematic, in that it happens repeatedly -- in the same way again and again. Over the long run, this biased measure will skew your distribution and any research conclusions drawn from the application of such measures will be fatally flawed. An example of non-random error may be seen in self-reported voting behavior. To the degree that self-reported voting behavior is in error as reported by survey respondents, the error is nearly universally to inflate the number of times an individual has voted in recent elections. This behavior may be traced to social desirability (the tendency of survey respondents to report behavior or attitudes that they feel will be positively viewed by others).

It may be helpful to illustrate this concept with a series of graphs. In the following graph, a concept is measured repeatedly with low levels of both random and non-random error; each measurement is mostly error-free.

Error creating thumbnail: Unable to save thumbnail to destination

In this graph, we illustrate non-random error or bias; here, there is a systematic error in measurement, even though there isn't much variation in the measurement each time we take it. A good example would be a mis-calibrated scale for measuring weight, which is always off by a certain amount.

Error creating thumbnail: Unable to save thumbnail to destination

This graph demonstrates high random error but low non-random error. Here our measurement is fairly unreliable, but the average measurement is not biased at all.

Error creating thumbnail: Unable to save thumbnail to destination

Finally, we can show both high levels of random and non-random error simultaneously. In this graph, not only is there not much consistency between the measures, but we also see a systematic bias in each of the individual measurements such that the average is still in inaccurate measure of the concept.

Error creating thumbnail: Unable to save thumbnail to destination









<references group=""></references>

Discussion questions[edit]

  1. How can you operationalize the concept of "power" to answer the question of which governmental leader is the most powerful?



  • [[Def: ]]
  • [[Def: ]]
  • [[Def: ]]