We’re going to skip the step of actually pulling data directly using an API (for now), so just download this file to your working directory, and then load that file into your R environment as an object (data frame) named web_data
:
web_data <- read.csv("gadata_example_2.csv", stringsAsFactors = FALSE)
We’re not going to dive into why we’re including stringsAsFactors = FALSE
at this point. The more comfortable you are with statistics, the more likely the phrase “this factor has three levels” will intuitively mean something to you. Let’s just put it aside for now.
Double-click on web_data
in the Environment window to view the data like a spreadsheet.
Note that the data, if it were in Excel, would be primed and ready to have a pivot table made out of it: each row contains a value for each dimension and for each metric. Put (overly) simply, this is what is referred to as a “tidy” data structure, and conventional wisdom is that this structure is, in the words of Keanu Reeves and Alex Winter, most excellent.
There are situations where we may want to move a dimension into the columns, certainly, and there are very quick and easy ways to do this, but, if we aim for tidy data as the default, we’ll be in good shape. Luckily, when pulling data from the Google Analytics or Adobe Analytics API, that’s the format the data arrives in.
That’s really it for this exercise. Now that we have some data loaded up into our environment, though, we can start working with it, which we’ll do in the next exercise.