Getting Started with Docker for Data Science and deploy a Shiny app

In this post, I would go over the basics of Docker containers, touch on using Docker for data science, and finally show how to deploy a simple Shiny app. I will assume no prior knowledge of Docker and show you how to install Docker on your laptop as well. In essence, the tutorial that follows is meant to be a self-contained and (hopefully) actionable getting started guide!

Introduction

Docker is an open-source software container platform. It creates containers on top of an operating system using Linux kernel features, thereby virtualizing the operating system instead of the physical hardware (making it more portable and efficient than Virtual Machines).

Why Docker?

  1. Easy to use: it’s “build once, run anywhere”, meaning you can build an application on your laptop and it can run unmodified on any server or cloud
  2. Fast: containers share the OS kernel and take up fewer resources than VMs — container is lightweight and starts almost instantly!
  3. Rich in ecosystem: Docker Hub alone has hundreds of thousands of public images, community-created and readily available for use (see next section). There are other Docker registry hosting services too (e.g. Quay).
  4. Scalable: break down your application into multiple containers for modularity, then you can link them together; scale easily by adding in new containers or destroying unused ones independently.

sticker-02-15-2

Containerized Data Science

Thanks to the rich ecosystem, there are already several readily available images for the common components in data science pipelines. Here are some Docker images to help you quickly spin up your own data science pipeline:

Example: Building a shiny app

In order to build a Docker image for our data science application, we will need a Dockerfile (see below template). To build and tag the image as kevinsis/myapp version 1.0.0, navigate to the directory the Dockerfile is located and run:

$ docker build –t kevinsis/myapp:1.0.0 .

A sample Dockerfile can be found below. Here I include the installation of the GNU Scientific Library too (used by some NLP packages in R). You will need the following in the directory that the above docker build command is run:

  • Dockerfile
  • shiny-server.conf and shiny-server.sh
  • a directory containing the ui.R and server.R (here named ‘myapp’)

FROM r-base:latest

MAINTAINER Kevin Siswandi "siswandi.kevin@gmail.com"

ENV http_proxy ""
ENV https_proxy ""

RUN apt-get update && apt-get install -y \
 sudo \
 gdebi-core \
 pandoc \
 pandoc-citeproc \
 libcurl4-gnutls-dev \
 libcairo2-dev/unstable \
 libxt-dev \
 libssl-dev \
 gsl-bin \
 libgsl0-dev

# Download and install shiny server
RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
 VERSION=$(cat version.txt) && \
 wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
 gdebi -n ss-latest.deb && \
 rm -f version.txt ss-latest.deb

RUN R -e "install.packages(c('shiny', \
'shinydashboard', \
'dplyr', \
'ggplot2'), repos='http://cran.rstudio.com/')"

COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
COPY myapp /srv/shiny-server/

EXPOSE 3838

COPY shiny-server.sh /usr/bin/shiny-server.sh

CMD ["/usr/bin/shiny-server.sh"]

Briefly, what the Dockerfile above does is installing the required dependencies and then shiny server before installing the required R packages and copying the shiny server configuration (shiny-server.conf) to the image. The shiny ui.R and server.R are located in the ‘myapp’ directory (you can change this) and are copied over to /srv/shiny-server in the Docker image. Notice that proxy settings need to be specified in the two ENV lines (you’ll need this to work if you are behind a corporate proxy).

After the image is built, you can run it as follows (e.g. an image named kevinsis/myapp tagged 1.0.0):

$ docker run --rm –p 3838:3838 kevinsis/myapp:1.0.0

If you have data to attach to the image (like me), you can do:

$ docker run –p 3838:3838 –v /home/kevinsis/dockerizedShiny:/srv/shiny-server/data kevinsis/myapp:1.0.0

Next, perhaps you may want to:

Appendix: Installing Docker

Note: The instructions below are taken from a repository by DataKindSG that I contributed to: https://github.com/DataKind-SG/contain-yourself

.. for Windows

Follow the setup instructions here: https://docs.docker.com/docker-for-windows/install/

Note: If your machine doesn’t met the requirement for “Docker For Windows”, try setting up “Docker Toolbox”:https://docs.docker.com/toolbox/toolbox_install_windows/

… for Linux

Follow the setup instructions for your flavor of Linux here: https://docs.docker.com/engine/installation/linux/

… for MacOS

Follow the setup instructions here: https://store.docker.com/editions/community/docker-ce-desktop-mac

Or if you use Homebrew Cask,

$ brew cask install docker

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s