That’s a pretty broad title, but, really, what we’re talking about here are some fundamentally different ways to treat data as we work with it. This topic can seem academic but it is relevant for web analysts specifically and researchers broadly. Yes, this topic out to be pretty darn important when it comes time to applying statistical operations and performing model building and testing.
So, we have to start with the basics: the nature of data. There are four types of data:
Each offers a unique set of characteristics, which impacts the type of analysis that can be performed.
The distinction between the four types of scales center on three different characteristics:
Nominal scales measure categories and have the following characteristics:
Consider traffic source (or last touch channel) as an example in which visitors reach our site through a mutually exclusive channel, or last point of contact. These channels would include:
(This list looks artificially short, but the logic and interpretation would remain the same for nine channels or for 99 channels.)
If we want to know that each channel is simply somehow different, then we could count the number of visits from each channel. Those counts can be considered nominal in nature.
Suppose the counts looked like this:
Channel | Count of Visits |
---|---|
Paid Search | 2,143 |
Organic Search | 3,124 |
1,254 | |
Display | 2,077 |
With nominal data, the order of the four channels would not change or alter the interpretation. Suppose we, instead, viewed the data like this:
Channel | Count of Visits |
---|---|
Display | 2,077 |
Paid Search | 2,143 |
1,254 | |
Organic Search | 3,124 |
The order of the categories does not matter.
And, the distance between the categories is not relevant. Display is not four times as much as paid search and organic search is not half of organic search. While there is an arithmetic relationship between these counts, that is only relevant if we treat the scales as ratio scales (see the Ratio Scales section below).
Finally, zero holds no meaning. We could not interpret a zero because it does not occur in a nominal scale.
Appropriate statistics for nominal scales: mode, count, frequencies
Displays: histograms or bar charts
At the risk of providing a tautological definition, ordinal scales measure, well, order. So, our characteristics for ordinal scales are:
Let’s work through our traffic source example and rank the channels based on the number of visits to our site, with “1” being the highest number of visits:
Channel | Count of Visits |
---|---|
Organic Search | 1 |
Paid Search | 2 |
Display | 3 |
4 |
Again, for this example, we are limiting ourselves to four channels, but the logic would remain the same for ranking nine channels or 99 channels.
By ranking the channel from most to least number of visitors in terms of last point of contact, we’ve established an order.
However, the distance between the rankings appears unknown. Organic Search could have one more visit compared to Paid Search or one hundred more visitors. The distance between the two items appears unknown.
Finally, zero holds no meaning. We could not interpret a zero because it does not occur in an ordinal scale. An item such as Organic Search could not maintain a zero ranking.
Appropriate statistics for ordinal scales: count, frequencies, mode
Displays: histograms or bar charts
Interval scales provide insight into the variability of the observations or data. Classic interval scales are Likert scales (e.g., 1 - strongly agree and 9 - strongly disagree) and Semantic Differential scales (e.g., 1 - dark and 9 - light). In an interval scale, users could respond to “I enjoy opening links to the website from a company email” with a response ranging on a scale of values.
The characteristics of interval scales are:
Unless a web analyst is working with survey data, it is doubtful he or she will encounter data from an interval scales. More likely, a web analyst will deal with ratio scales (next section).
Appropriate statistics for interval scales: count, frequencies, mode, median, mean, standard deviation (and variance), skewness, and kurtosis.
Displays: histograms or bar charts, line charts, and scatter plots.
An Illustrative Side Note About Temperature
An argument exists about temperature. Is it an interval scale or an ordinal scale? Many researchers argue for temperature as an interval scale. It offers order (e.g., 212\(^\circ\) F is hotter than 32\(^\circ\) F), distance (e.g., 40\(^\circ\) F to 44\(^\circ\) F is the same as 100\(^\circ\) F to 104\(^\circ\) F), and lacks a true zero (e.g., 0\(^\circ\) F is not the same as 0\(^\circ\) C). However, other researchers argue for temperature as an ordinal scale because of the issue related to distance. 200\(^\circ\) F is not twice as 100 F. The human brain registers both temperatures as equally hot (if standing outside) or mild (if touching a stove). Finally, we would not say that 80 F is twice as warm as 40\(^\circ\) F or that 30\(^\circ\) F is a third colder as 90\(^\circ\) F.Ratio scales appear as nominal scales with a true zero. They have the following characteristics:
Income is a classic example of a ratio scale:
In web analytics, the number of visits and the number of goal completions serve as examples of ratio scales. A thousand visits is a third of 3,000 visits, while 400 goal completions are twice as many as 200 goal completions. Zero visitors or zero goal completions should be interpreted as just that: no visits or completed goals (uh-oh… did someone remove the page tag?!).
For the web analyst, the statistics for ratio scales are the same as for interval scales.
Appropriate statistics for ratio scales: count, frequencies, mode, median, mean, standard deviation (and variance), skewness, and kurtosis.
Displays: histograms or bar charts, line charts, and scatter plots.
An Important Note: Don’t let the term “ratio” trip you up. Laypeople (aka, “non-statisticians”) are taught that ratios represent a relationship between two numbers. For instance, conversion rate is the “ratio” of orders to visits. But, as illustrated above, that is an overly narrow definition when it comes to statistics.
The table below summarizes the characteristics of all four types of scales.
Nominal | Ordinal | Interval | Ratio | |
---|---|---|---|---|
Order Matters | No | Yes | Yes | Yes |
Distance Is Interpretable | No | No | Yes | Yes |
Zero Exists | No | No | No | Yes |
Did you notice that we used channel for three of our four examples? And, for all three, the underlying metric was “visits.” What that means is that any given variable isn’t inherently a single type of data (type of scale). It depends on how the data is being used.
What that means is that some types of scales can be transformed to other types of scales. We can convert or transform our data from ratio to interval to ordinal to nominal. However, we cannot convert or transform our data from nominal to ordinal to interval to ratio.
Put another way, take a look at the cheat sheet above. If you have data using one scale, you can change a “Yes” to a “No” (and, thus, change the type of scale), but you cannot change a “No” to a “Yes.”
Pause here to take an aspirin as needed, should your head be starting to hurt.
As an example, let’s say our website receives 10,000 visits in a month. That figure – 10,000 visits – is a ratio scale. I could convert it to the number of visits in a week for that month (let’s pick our month as February, 2015, as the first of the month fell on a Sunday and there were exactly 4 weeks in the month!):
We could treat these numbers as interval; specifically, an equal width interval. However, there is little reason – conceptually or managerially – to treat these numbers as interval. So, let’s move on.
We could rank the weeks based on the number of visits, which would transform the data to an ordinal scale. From most to least number of weekly visits:
Finally, we could group week 2 and week 4 into “heavy traffic” weeks and group week 1 and week 3 into “light traffic” week and we would have created a nominal scale. The order heavy-light or light-heavy would not matter provided we remember the coding effort.
We started with a ratio scale that we ultimately transformed into a nominal scale. As we did so, we lost a lot of information. But, by transforming this data, we can use different analytical tools to answer different types of questions.