IMPORTANT NOTE: There is a bug in the Google Analytics v4 API that actually makes this example only partially functional (and not partially enough to make it work). Be sure to read the full intro here for an explanation!
This example pulls sessions by device category and medium, but uses pivoting within the call to Google Analytics so that the device category values are across the columns, while the medium values are in the rows. Generally, this is a bit of an odd thing to do for a couple of reasons:
spread()
function in dplyr
after it’s been pulled as a flat/simple query.But, it IS doable…except there is a bug in the API when it comes to pulling pivoted data – a column can get dropped! Check out the issue on Github for an update.
Even without the bug, I consider this approach vastly inferior to the approach used in Pivoting the Data (after Querying) example.
Be sure you’ve completed the steps on the Initial Setup page before running this code.
For the setup, we’re going to load a few libraries, load our specific Google Analytics credentials, and then authorize with Google.
# Load the necessary libraries. These libraries aren't all necessarily required for every
# example, but, for simplicity's sake, we're going ahead and including them in every example.
# The "typical" way to load these is simply with "library([package name])." But, the handy
# thing about using the approach below -- which uses the pacman package -- is that it will
# check that each package exists and actually install any that are missing before loading
# the package.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(googleAnalyticsR, # How we actually get the Google Analytics data
tidyverse, # Includes dplyr, ggplot2, and others; very key!
devtools, # Generally handy
googleVis, # Useful for some of the visualizations
scales) # Useful for some number formatting in the visualizations
# Authorize GA. Depending on if you've done this already and a .ga-httr-oauth file has
# been saved or not, this may pop you over to a browser to authenticate.
ga_auth(token = ".ga-httr-oauth")
# Set the view ID and the date range. If you want to, you can swap out the Sys.getenv()
# call and just replace that with a hardcoded value for the view ID. And, the start
# and end date are currently set to choose the last 30 days, but those can be
# hardcoded as well.
view_id <- Sys.getenv("GA_VIEW_ID")
start_date <- Sys.Date() - 31 # 30 days back from yesterday
end_date <- Sys.Date() - 1 # Yesterday
If that all runs with just some messages but no errors, then you’re set for the next chunk of code: pulling the data.
We have to, first, define what we’re going to pivot. After that, it’s a pretty straightforward query.
# Start by defining the pivot object. See ?pivot_ga4() for details.
my_pivot_object <- pivot_ga4("deviceCategory",
metrics = "sessions")
# Pull the data. See ?google_analytics_4() for additional parameters. The anti_sample = TRUE
# parameter will slow the query down a smidge and isn't strictly necessary, but it will
# ensure you do not get sampled data.
ga_data <- google_analytics(viewId = view_id,
date_range = c(start_date, end_date),
metrics = "sessions",
dimensions = "medium",
pivots = list(my_pivot_object),
anti_sample = TRUE)
# Go ahead and do a quick inspection of the data that was returned. This isn't required,
# but it's a good check along the way.
head(ga_data)
medium | sessions | deviceCategory.desktop.sessions | deviceCategory.mobile.sessions | deviceCategory.tablet.sessions |
---|---|---|---|---|
(none) | 1427 | 1122 | 283 | 22 |
(not set) | 7 | 7 | 0 | 0 |
display | 71 | 44 | 25 | 2 |
31 | 24 | 7 | 0 | |
organic | 2877 | 2550 | 293 | 34 |
partner | 2 | 2 | 0 | 0 |
This is the apparent/possible bug noted in the intro – we seem to be missing a column! Check out the issue on Github for an update.
The column headings are pretty ugly and redundant, and the sessions column isn’t necessarily one we’d want to keep, so let’s do a little cleanup.
# Remove the 'sessions' column
ga_data <- select(ga_data, -sessions)
# Use a little regEx to strip out "deviceCategory" and "sessions" from the column names
names(ga_data) <- gsub("deviceCategory\\.(.*)\\.sessions","\\1", names(ga_data))
# Check out the result of our handiwork
head(ga_data)
medium | desktop | mobile | tablet |
---|---|---|---|
(none) | 1122 | 283 | 22 |
(not set) | 7 | 0 | 0 |
display | 44 | 25 | 2 |
24 | 7 | 0 | |
organic | 2550 | 293 | 34 |
partner | 2 | 0 | 0 |
This could be a nice little heatmap, but to do that with ggplot2, we’d have to gather it up into a tidy format. Basically…unpivot it! That seems silly – better to have pulled it unpivoted! I’m taking a moral stance and not jumping through that particular set of hoops for this example. So, no visualization on this one!
This site is a sub-site to dartistics.com