# Objectives

• You know, like, measure stuff.

# Levels of Measurement

Precision refers to how fine (or fine-grained) a measure is. Measuring in millimeters is more precise than measuring in centimeters (excluding the use of decimals). Measuring income as "total annual household income in dollars" is more precise than breaking people into the categories of "low income," middle income," or "high income." Measuring race as "American Indian or Alaska Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian or Other Pacific Islander, or White" is more precise than "African American, Hispanic or Latino, White, or Other." A general rule is: use the most precise measure possible.

With regard to precision, there are four Levels of Measurement, these include Nominal, Ordinal, Interval, and Ratio.  The principal difference between Interval and Ratio is that Ratio measures have a meaningful zero. For example, while you can have temperatures lower than zero on the Fahrenheit or Celsius scale, you cannot have fewer than zero runs in a baseball game. While obviously important in analyses using proportions and, naturally, ratios, these levels are statistically indistinguishable for many standard statistical tests. Most basic statistical texts speak of three levels of measurement: Nominal, Ordinal, and Interval with the final level sometimes called Interval-Ratio.

## Nominal

The nominal level of measurement is when the values assigned to variables only represent different categories or classifications for that variable. The order of alternatives does not matter. A higher number simply reflects the arbitrary choice of the researcher who designed the coding scheme. Categories should be exhaustive and mutually exclusive. That is, there should be a category for every observation and each observation should fall into one, and only one, category.

### Examples of Nominal Level Measures

Sex: the choice to code "sex" as

Female = 1
Male = 2

versus

Male = 1
Female = 2

makes no statistical difference; the numerical values simply represent different categories.

Other examples of nominal levels of measurement, where the value assigned to the variable simply represents a different category or classification include: Country of Residence, Race or Ethnicity, or University Attended.

With regard to measures of central tendency [LINK TO MEASURES OF CENTRAL TENDENCY], only the mode {LINK TO MODE], the category which occurs most often, may be used. Median [LINK TO MEDIAN], the point which divides the data into the upper half and lower half, and mean [LINK TO MEAN], or mathematical average, cannot be calculated.

## Ordinal

The ordinal level of measurement allows the researcher to determine more or less of a variable and make comparisons on which observations have more or less of that variable. However, you cannot determine how much more or less of a variable one observation has relative to another. Categories should be exhaustive and mutually exclusive, but they also have some inherent relationship. One category is related to all others in that an observation which falls into another category can be said to have more or less of the thing or concept measured by the variable.

### Examples of Ordinal Level Measures

Education: Coding education as

Completed 8th grade or less = 1
Completed some high school = 2
Completed some college = 4

allows the researcher to state that an individual whose educational level is "5" (College graduate) has more education than someone whose educational level is "2" (Completed some high school), but not precisely how much more. That is, the first individual cannot be said to have "three more units" of education than the second individual.

Other examples of ordinal level measurement include subjective economic class, such as "working class, middle class, and upper class," or satisfaction scales. In the following scale

Very Dissatisfied = 1
Somewhat Dissatisfied = 2
Somewhat Satisfied = 3
Very Satisfied = 4

We know that someone who chooses Very Satisfied (4) is more satisfied than someone who chooses Somewhat Dissatisfied (2), but it is not logical to say that the first individual is "twice as satisfied."

With regard to measures of central tendency [LINK TO MEASURES OF CENTRAL TENDENCY], only the mode {LINK TO MODE](the category which occurs most often) and Median [LINK TO MEDIAN] (the point which divides the data into the upper half and lower half) may be used. The Mean [LINK TO MEAN], or mathematical average, cannot be calculated.

## Interval

Interval level measures allow the researcher to not only determine more or less of a variable, but how much more or less. The intervals between categories or values have meaning. This is true even if no observation in the data set is assigned a value within that space. Categories are exhaustive, mutually exclusive, have some inherent relationship, and the nature of the relationship between categories is exact and known.

### Examples of Interval Level Measures

Temperature coded as exact temperature

 City Temperature Austin, TX, USA 99°F Budapest, Hungary 89°F Longyearbyen, Svalbard, Norway 43°F Quito, Ecudor 68°F

Although there are no observations between 43° and 68°, we can say that the temperature in Quito is exactly 25° warmer than the temperature in Longyearbyen.  However, the temperature is Budapest cannot be said to be slightly more than "twice" as warm as the temperature in Longyearbyen because the Fahrenheit temperature scale has no meaningful zero (leaving aside, for the moment, a discussion of Absolute Zero on the Kelvin scale).  Since it is possible for some city to have a temperature below 0°, such a statement does not make sense.

With regard to measures of central tendency, Mode [LINK TO MODE], Median {LINK TO MEDIAN], and Mean [LINK TO MEAN] may be calculated.

## Ratio

Ratio level measurements may be thought of as an interval level measure which has a meaningful zero point. There cannot be observations that have fewer than zero units of that variable. This allows proportions and ratio to be calculated.

### Examples of Ratio Level Measures

Number of troops deployed

Number of years served on the bench for Supreme Court Justices

Number of elections voted in

60,000 troops is not only 20,000 more troops than a 40,000 troop deployment, but it can be said to be 50% larger. It can also be said to be twice as many troops as a deployment of 30,000.

With regard to measures of central tendency, Mode [LINK TO MODE], Median {LINK TO MEDIAN], and Mean [LINK TO MEAN] may be calculated.

# Nature of Measurement

We don't always have a choice about which level of measurement to use. If we did, we would always use the most precise measure available. However, data from the real world don't always fit our statistical needs. Sex, Religion, Party Affiliation, and Race or Ethnicity will always be Nominal. Strength of partisanship and degree of satisfaction will nearly always be Ordinal. However, there are also times when an Interval measure could be used, such as for household income, but because to the nature of surveys, we usually measure this variable in categories when using this data gathering technique.

Measures may also be described as Discrete or Continuous.

## Discrete Measures

Discrete measures have naturally separate points that cannot be further subdivided. Nominal level measures and Ordinal level measures are also naturally discrete in nature. However, many interval or ratio level measures may also be discrete. No matter how many runners are left on base, you cannot have 4.25 runs in a softball game. Even if you did not vote in every race on the ballot, you either did or did not participate in the election. A person cannot be said to have participated in 12.3 elections.

## Continuous Measures

Continuous measures consist of gradations that, in principle, can be infinitely subdivided. Income could be divided into thousands, hundreds, tens, ones, tenths, hundredths, thousandths, etc. Age can be divided into years, months, weeks, days, hours, minutes, seconds, tenths of seconds, etc.

# Accuracy

Accuracy concerns the question of how we measure our variables and how these measures relate to our theoretical concepts. The variables we choose to measure and how we choose to measure them must match the parameters of the theory we are seeking to test. A chief problem of accuracy in measurement is ensuring that the relationship between your theoretical concepts and the measurement of your variables is such that the relationship between your measured variables truly reflects the relationship between your theoretical concepts. In order to be accurate, a measure must be reliable and valid.

### Example: Social Status

Social Status is a concept often applied in the social and behavioral sciences. Unfortunately, this can often be a subjective concept. Depending on the theory you are testing, your measure of an individual's Social Status may include elements of such varying measures as income, occupational prestige, level of education, community standing, community involvement, etc. Which of these specific elements you chose to measure depends on the theory you are testing. Further, even when using a measure such as income, failure to account for cross-regional differences in cost of living could result in different classifications for individuals who hold relatively similar economic positions in their relative communities.

### Example: Presidential Legislative Effectiveness

A student might want to evaluate the factors that contribute to the U.S. president's legislative effectiveness. One might measure legislative effectiveness by the number of the president's proposals which were passed into law by Congress. Among the factors that one might hypothesize would impact a president's legislative effectiveness is the influence of that president. Certainly, public approval ratings contribute to a president's influence, but what else? Richard Neustadt has noted that a president's power is determined by his "power to persuade." Deciding how you intend to measure a president's "persuasiveness" would greatly impact the validity of the test of your theory.

## Reliability

A measure is reliable when repeated applications yield the same results; that is, a reliable measure will produce the same answer each time it is applied to a particular object.

### Example: Reliable and Unreliable Measures

If one desired to measure the size of one's methodology classroom, one might choose to use either a tape measure or one's own "strides" or "paces." Because the lengths of the units of measurement on the tape measure have been standardized, you should get the same result each time you use the tape measure to measure the room. However, because the length of one's stride varies not only from person to person but also slightly with any given stride from the same person, you are unlikely to achieve perfectly identical measures by repeatedly pacing off the distance from one side of the room to the other. Therefore, measuring the room in feet and inches using a tape measure will be more reliable than measuring the room in "paces."

## Tests for Reliability

Test-retest is applying the same measure a second time to see whether the observer obtains the same results. Note: before drawing conclusions based on a test-retest method, the observer must consider whether any observed differences are due to the quality of the measure or the consistency of the application of that measure.

Internal consistency (sometimes called a "split-half check") is usually examined when multiple measures are used to measure the same concept. For example, multiple, differently-worded items in a survey might seek to measure latent racism or multiple knowledge and interest questions may be combined create a political knowledge scale. These items taken separately should reflect the assessments as determined collectively.

## Validity

A measure is valid when it measures what it claims to measure, or is supposed to measure. To the degree the measure truly mirrors the the concept drawn from one's theory, then that measure will be valid for testing that theory.

NOTE: if a measure is valid then it will also be reliable. However, a measure can be reliable without being valid.
For Example: if you use a properly marked quart container to measure how many liters of water are contained in a vessel, then you will get the same answer each time; therefore, the measure will be reliable. However, since the container is marked for quarts and not liters, you measure will not be valid.

## Random versus Non-random Error

One way to understand the difference between reliability and validity is to examine the difference between random and non-random error.

Random error occurs primarily from mistakes. These may be coding errors or errors from respondents that are not systematic, but rather stochastic (random and non-deterministic). There is no pattern to random error and an error on the "high side" of a measure is just as likely as an error on the "low side" of a measure. With a large enough sample size, these errors tend to average out and will not fatally skew your results -- with a large enough sample size.
Non-random error is often systemic, in that it permeates the entire system, and systematic, in that it happens repeatedly -- in the same way again and again. Over the long run, this biased measure will skew your distribution and any research conclusions drawn from the application of such measures will be fatally flawed. An example of non-random error may be seen in self-reported voting behavior. To the degree that self-reported voting behavior is in error as reported by survey respondents, the error is nearly universally to inflate the number of times an individual has voted in recent elections. This behavior may be traced to social desirability (the tendency of survey respondents to report behavior or attitudes that they feel will be positively viewed by others).

# References

<references group=""></references>

• [[Def: ]]
• [[Def: ]]
• [[Def: ]]