A large part of R’s success is the ecosystem of open-sourced packages that add extra functionality to what the core installation of R (referred to as “base R”) includes.
The Comprehensive R Archive Network (CRAN) is the official central repository for R packages, where human-reviewed packages go through quality checks for publishing on the CRAN network around the world. It is through CRAN that you install packages with the install.packages()
function, or via the Packages pane in the bottom right of RStudio.
One of the nicer aspects of having CRAN as a centralized resource is that, well, you don’t actually have to visit the site very often. When you use the install.packages()
function without any of the optional parameters, R automatically looks for the package on CRAN, downloads it, and installs it. There is no need to download a file and then double-click to install it.
This will become absolutely second nature, but it’s one of those things that is not 100% intuitive at first. When you’re using a package (which, really, means you’re using one or more of the functions in a package), there are two things that have to happen:
The package has to be installed. This is what is described above. The first time you use a package, you will need to type install.packages("[the package name]")
(note that the name of the package is inside quotation marks) in the console and press <Enter>
. That will pull the package (and any packages the package depends on – few packages are built entirely from scratch!) from CRAN. And…it will show up in the Packages tab in the bottom right of your RStudio environment. This is a one-time step, although there is no harm in installing the same package multiple times, and, occasionally, you will find that a package has been updated and needs to be re-installed.
The package has to be loaded. In your script (or in the console), enter library([the package name])
(unlike install.packages()
, the name of the package is not placed in quotation marks when using library()
). Typically, you will have a list of these at the beginning of your scripts (although there are other techniques for centralizing a list of packages you commonly use…we’re not going to go there for now). Once a package is loaded, it will show with a checkbox next to it in the Packages list in your RStudio environment.
When you get a “function not found” error when running a script, 9 times out of 10 that’s because the function you’re calling is in a package that isn’t loaded.
Throughout this site, as well as in examples across the interwebs, you will see code examples that look something like this:
# This installs googleAnalyticsR if you haven't got it installed already
if(!require(googleAnalyticsR)) install.packages("googleAnalyticsR")
# This then loads the library
library(googleAnalyticsR)
The if()
statement is simply conditionally checking to see if the package is already installed. If it is, then we don’t want the script to re-install it. While there is (generally) no harm in re-installation, it’s unnecessary and will tack on a few seconds to the running time for your script.
If, however, the package is not already installed, then you generally want to install it. Otherwise, the script will break when you try to load the library or use any of the functions within it.
This list simply cannot be comprehensive, but, to give you a sense of the breadth and scope of R Packages, below are a few representative examples:
You will learn to know the name “Hadley Wickham,” as he is quite possibly the most influential R user (and package and content creator) of the modern era. He also has now worked for RStudio for a number of years. Most of the packages below are part of what used to be called “the Hadleyverse,” because Wickham was key in their creation. But, Wickham was successful at rebranding these as “the Tidyverse” because, at their core, they’re focused on working with data that is in a “tidy” format (which we’ll get to later).
The creators of this site recommend working in the tidyverse world, which means almost every one of their scripts installs the tidyverse package (library(tidyverse)
), which is a quick way to install all (or…most?) of the packages in the tidyverse (there are some Tidyverse-adjacent packages that are part of the Tidyverse, but that aren’t part of the core Tidyverse, so they have to be installed separately if they are being used). There is a whole website that goes into great detail as to the tidyverse packages. Some of the key packages in the Tidyverse are:
%>%
operator (which is available once you’ve loaded dplyr…but actually comes from another package that dplyr uses called magrittr) can streamline the heck out of your code!There are other packages in the core Tidyverse, but we don’t want to overwhelm you (yet). There are a few other packages that are super common, so it’s good to know about them:
There are also thousands of more experimental packages that are only available through Github. These packages have become much easier to work with since the introduction of the package devtools, which via its function install_github
, provides access to this universe of packages through scriptable commands.
Beware, though, that the packages on Github are, by their nature, more experimental, so exercise due care to ensure they are trustworthy!