revisit: a Workflow Tool for Data Science
Norman Matloff, Reed Davis, Laurel Beckett, Paul Thompson

TL;DR
The revisit package enhances reproducibility in data science by creating a transparent, replayable record of statistical analyses, including error warnings, to address the scientific reproducibility crisis.
Contribution
It introduces a tool that generates a reproducible software paper trail of statistical operations, aiding verification and methodological transparency.
Findings
Provides a replayable record of data analysis steps
Issues warnings for potential statistical errors
Improves transparency and reproducibility in scientific research
Abstract
In recent years there has been widespread concern in the scientific community over a reproducibility crisis. Among the major causes that have been identified is statistical: In many scientific research the statistical analysis (including data preparation) suffers from a lack of transparency and methodological problems, major obstructions to reproducibility. The revisit package aims toward remedying this problem, by generating a "software paper trail" of the statistical operations applied to a dataset. This record can be "replayed" for verification purposes, as well as be modified to enable alternative analyses. The software also issues warnings of certain kinds of potential errors in statistical methodology, again related to the reproducibility issue.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Data Mining Algorithms and Applications · Data Stream Mining Techniques
