Data Science with R


To understand computations in R, two slogans are helpful: 1. Everything that exists is an object. 2. Everything that happens is a function call.

 John Chambers, creator of R.

Thanks to its excellent packages and toolsets, R really shines when it comes to solving three areas of data science: data acquisition/manipulation, data visualization, and statistics/machine learning. With over 10,000 packages in CRAN, I thought it would be awesome to describe data science using R packages. In fact we don’t need all ten thousand (several packages are actually redundant) but we can decently do most work in Data Science with just a handful of them. Here we go!

Core + productivity tools:

See also: Quick list of useful R packages.


Statistics/Machine Learning:

Text Analytics:

When it comes to text analytics, there are several packages that essentially do the same thing and it often comes to personal style to decide which packages to use. My favourites are:

  • NLP
  • tm
  • SnowballC
  • text2vec
  • quanteda
  • wordcloud
  • topicmodels


See also: Data Visualization with R

Data Preparation

Common packages for data preparation:

Getting data:

  • fread (from data.table)
  • read.csv
  • read.csv2
  • scan
  • readLines
  • read.xlsx (from xlsx)


There are also datasets packages available:

How about handling very large datasets? Consider bigmemory and ff:


If you’re just getting started, try this: learning-r-in-seven-simple-steps.


Other useful resources


1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s