An introduction to Docker for reproducible research, with examples from the R environment
Carl Boettiger

TL;DR
This paper introduces Docker as a tool to enhance computational reproducibility in scientific research, demonstrating its application with R and discussing its advantages over traditional virtual machines and workflow systems.
Contribution
It provides an overview of Docker's capabilities for reproducible research and offers practical examples using the R environment, highlighting its benefits and limitations.
Findings
Docker improves reproducibility and portability of research environments.
Docker simplifies sharing and extending computational analyses.
Examples demonstrate Docker's effectiveness with R in research workflows.
Abstract
As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Analysis with R · Research Data Management Practices
