Remember: Rome wasn’t built in a day. And “Rome” starts with “R.”
R takes time and practice to master. But, hopefully, you’ve been energized by some of the exercises and examples shared today. Let’s do a quick review.
The Basics
Some of the keys we covered early in the day and then have repeated already to the point that, with luck, they’re already seeming intuitive:
RStudio vs. R – you work in RStudio, but the underlying engine is R. And…each of the four main “panes” in RStudio is useful!
The Console vs. Scripts – you will curse VBA for not having this capability, should you ever get back into it!
Packages – they have to first be installed (install.packages()) and then have to be loaded (library()) to use them.
Help!(?) – to get help on a function, simply type ? and then the function name.
Case Sensitivity – R is case-sEnsItiVe. This can trip you up. But, it’s a good reason to follow a consistent coding style.
<- – to assign a value (or function) to an object, use <- rather than =.
Possibly Starting to Sink In
Beyond the basics, we covered other topics that can take a while to wrap your head around. Keep at it. These are key!
Atomic Classes – these are different data types, and the main ones are character(), numeric(), logical(), and Date().
Coercion – this is simply a fancy way of saying “you can often convert a value from one class to another” using as. functions (as.character(), as.numeric(), etc.)
Vectors – vectors are simply a set of values with the same atomic class.
Data Frames – data frames are super handy and are like an Excel spreadsheet on steroids; the one catch is that each column in a data frame can only be a single atomic class.
Lists – lists are like data frames…but way more confusing. They can hold multiple elements of variable length, have nested values, and have all sorts of combinations of classes. They are confusing…but also hold much of the power of R.
Subsetting – there are many ways to subset data, including using the $ operator, using [] notation, and doing all manner of things inside those []. There is also dplyr()…
Almost Certainly “More Study Required”
(Hah! And we called this page “Key Topics!” Are you starting to feel like when you ask for “KPIs” an get 37 metrics in return?)
Some additional topics that start to move from “making R work” to “making it work efficiently and awesomely:”
Tidy Data / dplyr() – tidy data is “data that would be very easy to pivot in Excel” (no dimensions across columns).
The pipe – if you’re using dplyr() you can use the %>% operator to pass the output of a function in to a subsequent function. Once you get the hang of it, this makes for efficient and readable code.
Untidying Data – once you have tidy data, it’s very easy to untidy it (or subsets of it) for specific uses; the key to this are the functions in the tidyr package.
Analytics APIs – it’s getting easier and easier to bring data in directly from Google Analytics, the Google Search Console, Adobe Analytics, and more. There are packages for this!
.Renviron – you don’t want to inadvertently share your API keys, so use a .Renviron file to store those as environment variables and then Sys.getenv() to retrieve and assign them.
Github – RStudio integrates nicely with Github, and, as a platform built for the version control of code, it’s a platform you really should consider using; the free service makes all of your repositories public, but it’s inexpensive to have a paid service that enables private repositories, too.
And That Just Gets Us to Statistics!
The Nature of Data – getting from “dimensions and metrics” to “variables” opens up a lot of (sometimes confusing) possibilities. Variables can be nonmetric or metric; or, more specifically, they can have nominal, ordinal, interval, or ratio scales.
Different statistical methods – we covered the Cross Tabulation with a \(\chi^2\) Test for Independence, ANOVA, Correlation, and Regression; those are a good start!
And Back to More R Stuff!
Making Fast R Code – at first…don’t worry too much about it; but, be aware that, when you’ve got something running for 20 minutes or 40 minutes or over night, it might be because you didn’t appropriate use vectorisation (you used loops), or you didn’t take advantage of the best packages or parallel processing
Functions – you started off using (and will continue to use) functions from Base R and various R packages; you should pretty quickly be writing your own functions inside your scripts (as a start – see the next point) to make your code cleaner and easier to read and maintain
Lists – they can be confusing, but they are truly unique and powerful; applying functions to lists is fast, slick, and awesome with lapply() once you’ve fully grokked it
Creating Your Own Packages – once you’re comfortable with the basics of R, and once you start using functions within your scripts, you will quickly want to start making your own packages for internal use. They let you re-use code very easily – just for yourself, or with others.
Creating APIs with R – using packages someone else built to access APIs is great…until those packages don’t exist; as long as a platform has an API, though, you can likely use the httr package to build your own functions to access it! (And maybe even put it in your own package; and maybe even publish that package!)
Getting to Deliverables
Data Visualization – while base plotting functions in R are good for quick visualizations, ggplot2 is where there starts to be real power in the layout and formatting. The “gg” in ggplot2() is for “grammar of graphics,” and that grammar can be a bit of a mind-bender. But, once you embrace geoms, aesthetics, and themes, you will be off to the races!
Interactivity Easily – packages like highcharter and plotly add a nice level of interactivity to your visualizations very easily
RMarkdown – RMarkdown has a relatively simple syntax that can be used to output results to slides, to a PDF, to HTML, to emails! and more.
Shiny – Shiny lets you build web applications that are actually “reactive” (when the user changes something, the data/visualisations refresh automatically).
RMarkdown WITH Shiny – yeah; it’s awesome.
That’s about it. Once you’ve got these “key” topics mastered…you’ll realize that your adventure with R has, really, just begun!