The craft and coordination of data curation: complicating "workflow" views of data science
Andrea K. Thomer, Dharma Akmon, Jeremy York, Allison R. B. Tyler, Faye, Polasek, Sara Lafia, Libby Hemphill, Elizabeth Yakel

TL;DR
This paper explores the complex, craft-based nature of data curation at a social science data repository, highlighting how curatorial work is often invisible and intertwined with best practices, impacting data usability and reproducibility.
Contribution
It provides a detailed, empirical analysis of data curation practices, emphasizing the craft and coordination involved, which challenges traditional workflow models.
Findings
Curatorial work involves craft practices that defy simple workflow models.
Best practices and craft practices are deeply intertwined.
The work of curators significantly impacts data usability and reproducibility.
Abstract
Data curation is the process of making a dataset fit-for-use and archiveable. It is critical to data-intensive science because it makes complex data pipelines possible, makes studies reproducible, and makes data (re)usable. Yet the complexities of the hands-on, technical and intellectual work of data curation is frequently overlooked or downplayed. Obscuring the work of data curation not only renders the labor and contributions of the data curators invisible; it also makes it harder to tease out the impact curators' work has on the later usability, reliability, and reproducibility of data. To better understand the specific work of data curation -- and thereby, explore ways of showing curators' impact -- we conducted a close examination of data curation at a large social science data repository, the Inter-university Consortium of Political and Social Research (ICPSR). We asked, What does…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Scientific Computing and Data Management · Data Analysis and Archiving
