Principles for data analysis workflows
Sara Stoudt, Valeri N. Vasquez, Ciera C. Martinez

TL;DR
This paper outlines fundamental principles for designing reproducible data analysis workflows, emphasizing phases centered around audience communication and drawing parallels with software development to improve data science practices.
Contribution
It introduces a structured framework of three workflow phases—Exploratory, Refinement, and Polishing—and discusses their role in reproducibility and effective communication in data analysis.
Findings
Defines three core workflow phases for data analysis.
Highlights the importance of audience-centered communication.
Provides guidance to improve reproducibility and research quality.
Abstract
Traditional data science education often omits training on research workflows: the process that moves a scientific investigation from raw data to coherent research question to insightful contribution. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining three phases: the Exploratory, Refinement, and Polishing Phases. Each workflow phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between principles for data-intensive research workflows and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
