Who Re-Uses Data? A Bibliometric Analysis of Dataset Citations
Geoff Krause, Madelaine Hare, Mike Smit, Philippe Mongeon

TL;DR
This study analyzes dataset citations from OpenAlex to understand data re-use patterns, revealing low citation rates, discipline and country differences, and the importance of sharing practices for open research.
Contribution
It provides the first comprehensive bibliometric analysis of dataset re-use, highlighting citation patterns and the relationship between data producers and users across various levels.
Findings
Most datasets have no recorded citations.
Most cited datasets have only a single citation.
The US is the leading exporter of re-used datasets.
Abstract
Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Scientific Computing and Data Management · scientometrics and bibliometrics research
