To understand computations in R, two slogans are helpful: 1. Everything that exists is an object. 2. Everything that happens is a function call.
John Chambers, creator of R.
Thanks to its excellent packages and toolsets, R really shines when it comes to solving three areas of data science: data acquisition/manipulation, data visualization, and statistics/machine learning. With over 10,000 packages in CRAN, I thought it would be awesome to describe data science using R packages. In fact we don’t need all ten thousand (several packages are actually redundant) but we can decently do most work in Data Science with just a handful of them. Here we go!
Core + productivity tools:
See also: Quick list of useful R packages.
- glmnet (sparse matrix and cv supported, c.f. glm from stats)
When it comes to text analytics, there are several packages that essentially do the same thing and it often comes to personal style to decide which packages to use. My favourites are:
See also: Data Visualization with R
Common packages for data preparation:
- data.table: tutorial, examples
- zoo: forward fill
- timezone handling: tz list
- fread (from data.table)
- read.xlsx (from xlsx)
There are also datasets packages available:
How about handling very large datasets? Consider bigmemory and ff:
If you’re just getting started, try this: learning-r-in-seven-simple-steps.
- Rtools (potential issues: gcc)
Other useful resources