FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows
Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan, Hollocombe, Bram Boskamp, Ryan Field, Dennis Reddyhoff, Kristian Zarebski,, Antony Wilson, Bruno Viola, Martin Burke, Blair Archibald, Paul Bessell,, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett

TL;DR
This paper presents a FAIR data pipeline that enhances transparency and traceability in scientific workflows, particularly during epidemiological crises like COVID-19, by enabling provenance tracking from data to outputs.
Contribution
The paper introduces a provenance-driven data management tool that integrates FAIR principles into epidemiological workflows, improving transparency and trust in scientific evidence.
Findings
Enables annotation of data during analysis
Allows tracing of outputs back to original data sources
Supports transparent policy decision-making
Abstract
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that allows easy annotation of data as they are consumed by analyses,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Scientific Computing and Data Management · Academic Publishing and Open Access
