I wanted to containerise a pipeline of code that was predominantly developed in Python but has a dependency on a model that was trained in R.
I succeeded in doing so by using a base Ubuntu image with an installation of the r-base package and python3.6 - and much troubleshooting on the web! So I’m blogging this to share my learnings because the help forums online seem to still be in their infancy, hoping it helps others in their DevOps efforts.
To simplify, let’s consider I have an R code that runs the model (a random forest) but it needs to be part of a data pipeline that was built in Python. The Python pipeline performs some functionality first and generates input for the model, then executes the R code with that input, before taking the output to the next stage of the Python pipeline. So we’ll create a template for this process by writing a simple test Python function to call an R code, and put this in a Docker container to demonstrate this capability.
Below is the test Python code “test_call_r.py” that uses the package subprocess to execute the R code “run_rf_model.R” that uses the random forest model (as if the R code was run from the command line outside the pipeline):
<pre class="wp-block-preformatted">test_call_r.py<br></br> import subprocess<br></br> def call_r():<br></br> print('Calling R')<br></br> # Need to the know the path to call the R application for executing the code via subprocess:<br></br> subprocess.call([Rscript, run_rf_model.R])<br></br> print(Finished calling R)<br></br> call_r()
The Dockerfile I built for running R and Python to run together is:
<pre class="wp-block-preformatted">FROM ubuntu:latest ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y --no-install-recommends build-essential r-base r-cran-randomforest python3.6 python3-pip python3-setuptools python3-dev WORKDIR /app COPY requirements.txt /app/requirements.txt RUN pip3 install -r requirements.txt RUN Rscript -e install.packages('data.table') COPY . /app
The commands to build the image, run the container (naming it SnakeR), and execute the code are:
<pre class="wp-block-preformatted">docker build -t my_image . docker run -it --name SnakeR my_image docker exec SnakeR /bin/sh -c python3 test_call_r.py
I treated it like a Ubuntu OS and built the image as follows:
- suppress the prompts for choosing your location during the R install;
- update the apt-get;
- set installation criteria of:
- y = yes to user prompts for proceeding (e.g. memory allocation);
- install only the recommended, not suggested, dependencies;
- include some essential installation packages for Ubuntu;
- r-base for the R software;
- r-cran-randomforest to force the package to be available (unlike the separate install of data.table which didn’t work for randomForest for some reason);
- python3.6 version of python;
- python3-pip to allow pip be used to install the requirements;
- python3-setuptools to somehow help execute the pip installs (?!);
- python3-dev to execute the JayDeBeApi installation as part of the requirements (that it otherwise confuses is for Python2 not 3);
- specify the active “working directory” to be the /app location;
- copy the requirements file that holds the python dependencies (built from the virtual environment of the Python codebase, e.g., with pip freeze);
- install the Python packages from the requirements file (pip3 for Python3);
- install the R packages (e.g. just data.table here);
- copy the directory contents to the specified working directory /app.
As a relative n00b to Docker, having just used some templates as a Python user (FROM python:3), I had to remind myself that it is essentially a Virtual Machine that can be considered host to an OS, such as Ubuntu, on which can be installed any requirements. So the need to have Python and R beside each other in the one container should not have been considered a challenge. After some web-searching I was under the impression that it might not be so straightforward, and couldn’t find examples of others who had done it already. But with some perseverance and troubleshooting of each error message as it arose, I eventually built a Docker image containing both R and Python - and it worked! So this is me sharing my process in case it helps anyone else new to Docker who needs something like it. Enjoy!
p.s. Thanks to colleagues who entertained my chats about this - their experiences and discussion helped it happen.