Here are some good habits to get into that I learned from experience to make working with R easier.
script
and data
folder in the project straight away./scripts/your-script.R
in the script folder, and raw or processed data written to the data folder (./data/your_data.csv
)setwd()
) in the scripts; it ruins portabilitystop()
to write informative error messages.roxygen
format (#' lines above function
).rowname
instead of row.name
, for instance).class(obj) == "list"
not class(obj) == "data.frame"
). Read the function documentation carefully.str()
a lot.str()
?[function name]
.paste0
- shortcut for paste("blah", "blah", sep = "")
list()
- it’s a way to group related objects:stuff = list(mydata = data,
the_author = "Bob",
created = Sys.Date())
## can then access items via $
stuff$mydata
stuff$the_author
stuff$created
lapply()
, apply()
, sapply()
, etc.data.frames
are special lists…just with equal length objects.## example with some user Ids
lookup <- c("Bill","Ben","Sue","Linda","Gerry")
names(lookup) <- c("1231","2323","5353","3434","9999")
lookup
## 1231 2323 5353 3434 9999
## "Bill" "Ben" "Sue" "Linda" "Gerry"
## this is a big vector of Ids you want to lookup
big_list_of_ids <- c("2323","2323","3434","9999","9999","1231","5353","9999","2323","1231","9999")
lookup[big_list_of_ids]
## 2323 2323 3434 9999 9999 1231 5353 9999 2323
## "Ben" "Ben" "Linda" "Gerry" "Gerry" "Bill" "Sue" "Gerry" "Ben"
## 1231 9999
## "Bill" "Gerry"
Reduce()
operates repeatedly on a list, adding the result to its previous. A good example is for reading a folder full of files:## say you have lots of files in folder "./data/myfolder"
## we can use lapply on write.csv to read in all the files:
folder <- "./data/myfolder"
filenames <- list.files(folder)
## a list of data.frames read from the csv
df_list <- lapply(filenames, read.csv)
## operate rbind (bind the rows) on the list, iterativly
all_files <- Reduce(rbind, df_list)
## all_files is now one big dataframe, all_files
## in one line:
all_files <- Reduce(rbind, lapply(filenames, read.csv))
the_data[-1]
where the_data
is your data.frame. This is equivalent to the_data[,-1]
library(dplyr)
and the select()
function that lets you specify dropped columns like select(-notthisone, -notthisonetoo)
or in base R use setdiff()
against the names:names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
new_data <- mtcars[ , setdiff(names(mtcars), c("mpg","disp","drat"))]
names(new_data)
## [1] "cyl" "hp" "wt" "qsec" "vs" "am" "gear" "carb"