Time-series are a special sub-class of data, referring to when you have some ordered observations over time.
Time-series are often found when dealing with digital analytics, and R uses a special class to deal with them as they have some unique properties. A guide to general R timeseries is here and a list of possible packages to use with time-series is in this CRAN task view. If you want to get into forecasting then another R New-Zealand hero is Rob Hyndman, who has lots of techniques described on his blog.
Time-series are created in base R via the ts()
command, which you will typically feed in some numeric data that is assumed to be in order of observation.
For most purposes you will be starting from a data.frame
and need to transform it into a time-series
in order for the R functions to work.
This exercise requires having a web_data
data frame. You can either load up some sample data by completing the I/O Exercise (which is what is shown in the step-by-step instructions below), or, if you have access to a Google Analytics account, you can use your own data by following the steps on the Google Analytics API page. Or, if you have access to an Adobe Analytics account, then you can use your own data by following the Generating web_data steps on the Adobe Analytics API page.
web_data
data frame to work with, the command head(web_data)
should return a table that, at least structurally, looks something like this:
date | channelGrouping | deviceCategory | sessions | pageviews | entrances | bounces |
---|---|---|---|---|---|---|
2016-01-01 | (Other) | desktop | 19 | 23 | 19 | 15 |
2016-01-01 | (Other) | mobile | 112 | 162 | 112 | 82 |
2016-01-01 | (Other) | tablet | 24 | 41 | 24 | 19 |
We will start by pivoting the data like before:
## use tidyverse to pivot the data
library(dplyr)
library(tidyr)
## get only desktop rows, and the date, channelGrouping and sessions columns
pivoted <- web_data %>%
filter(deviceCategory == "desktop") %>%
select(date, channelGrouping, sessions) %>%
spread(channelGrouping, sessions)
## get rid of any NA's and replace with 0
pivoted[is.na(pivoted)] <- 0
head(pivoted)
date | (Other) | Direct | Display | Organic Search | Paid Search | Referral | Social | Video | |
---|---|---|---|---|---|---|---|---|---|
2016-01-01 | 19 | 133 | 307 | 17 | 431 | 555 | 131 | 68 | 0 |
2016-01-02 | 156 | 1003 | 196 | 43 | 1077 | 1060 | 226 | 158 | 3 |
2016-01-03 | 35 | 1470 | 235 | 29 | 696 | 489 | 179 | 66 | 90 |
2016-01-04 | 31 | 1794 | 321 | 70 | 1075 | 558 | 235 | 46 | 898 |
2016-01-05 | 27 | 1899 | 309 | 74 | 1004 | 478 | 218 | 47 | 461 |
2016-01-06 | 21 | 1972 | 204 | 299 | 974 | 494 | 246 | 47 | 418 |
For data lengths less than a year, then a weekly seasonality is most likely. Adding this seasonality to your data can be very helpful in creating forecasts as it helps extract a real trend.
## make a time-series object
web_data_ts <- ts(pivoted[-1], frequency = 7)
## time-series are set up to have useful plots
plot(web_data_ts, axes = FALSE)
With seasonal data, you can extract the seasonality (weekly in this example) so you can more easily see the overall trend of data:
decomp <- decompose(web_data_ts[, "Organic Search"])
plot(decomp)
Forecasting looks at the trend of the data and makes judgements on where the trend will go. A good blog from Hyndman on daily data forecasting is here.
A couple of example functions:
library(forecast)
## performs decomposition and smoothing
fit <- ets(web_data_ts[, "Organic Search"])
## makes the forecast
fc <- forecast(fit)
plot(fc)
HoltWinters()
is a function in base R that also lets you take account of seasonal effects:
fit2 <- HoltWinters(web_data_ts[, "Organic Search"])
## makes the forecast
fc2 <- forecast(fit2, h = 25)
plot(fc2)
We can only work with numeric data, but we would also like to keep the date information, mainly for labelling purposes. xts
lets you create the time-series whilst specifying the date labels like so:
library(xts)
## create a time-series zoo object
web_data_xts <- xts(pivoted[-1], order.by = as.Date(pivoted$date), frequency = 7)
web_data_xts
will look similar to when it was a data.frame, but its now in the right class to do some nice related functions.
A good example of a library that needs xts
is CausalImpact, which is a fantastic library from Google well suited for digital marketing.
CausalImpact gives you an estimate on how much effect an event at a certain point of time had on your metrics, in absolute and relative terms. In this case, an event could be a TV campaign starting or changing your Title tags. It also lets you add control segments, so you can adjust for known effects.
As an example, we assume the observed peak in Social traffic is due to the start of some campaign in May 15th 2016 (in reality you should not cherry pick dates like this!) and we woud like to observe its effect on Video sessions. We also add Direct sessions as a control to help account for general website trends.
library(CausalImpact)
pre.period <- as.Date(c("2016-02-01","2016-05-14"))
post.period <- as.Date(c("2016-05-15","2016-07-15"))
## data in order of response, predictor1, predictor2, etc.
model_data <- web_data_xts[,c("Video","Social","Direct")]
impact <- CausalImpact(model_data, pre.period, post.period)
plot(impact)
CausalImpact is the work-horse behind the GA Effect app.