Lost or found? Discovering data needed for research
Kathleen Gregory, Paul Groth, Andrea Scharnhorst, Sally Wyatt

TL;DR
This paper investigates how researchers discover and reuse data, providing empirical insights into data discovery practices, needs, and the role of social and literature-based strategies, with implications for designing better data systems.
Contribution
It offers the largest empirical survey on data discovery and reuse, introduces a typology for data reuse, and suggests practical design principles for data discovery systems.
Findings
Researchers rely on social interactions and literature searches for data discovery.
A typology of data reuse practices is proposed.
Design recommendations include supporting diverse practices and metadata use.
Abstract
Finding data is a necessary precursor to being able to reuse data, although relatively little large-scale empirical evidence exists about how researchers discover, make sense of and (re)use data for research. This study presents evidence from the largest known survey investigating how researchers discover and use data that they do not create themselves. We examine the data needs and discovery strategies of respondents, propose a typology for data reuse and probe the role of social interactions and literature search in data discovery. We consider how data communities can be conceptualized according to data uses and propose practical applications of our findings for designers of data discovery systems and repositories. Specifically, we consider how to design for a diversity of practices, how communities of use can serve as an entry point for design and the role of metadata in supporting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
