Caching and Reproducibility: Making Data Science experiments faster and   FAIRer

Moritz Schubotz; Ankit Satpute; Andre Greiner-Petter; Akiko Aizawa,; Bela Gipp

arXiv:2211.04049·cs.SE·November 10, 2022

Caching and Reproducibility: Making Data Science experiments faster and FAIRer

Moritz Schubotz, Ankit Satpute, Andre Greiner-Petter, Akiko Aizawa,, Bela Gipp

PDF

TL;DR

This paper advocates integrating caching into data science research software development to enhance experiment speed, reproducibility, and FAIR principles, thereby reducing time and effort for future researchers.

Contribution

It introduces caching recommendations tailored for research software, emphasizing early integration to improve reproducibility and FAIR compliance in data science experiments.

Findings

01

Caching improves experiment reproducibility.

02

Caching reduces computational time and effort.

03

Recommendations are demonstrated on a mathematical information retrieval project.

Abstract

Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computationally expensive experiments. In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written. This article…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.