Research Data in Scientific Publications: A Cross-Field Analysis
Puyu Yang, Giovanni Colavizza

TL;DR
This study analyzes data-sharing practices across scientific disciplines, revealing trends, challenges, and opportunities for enhancing open science and data reuse, with a focus on dataset mentions, referencing, and temporal changes.
Contribution
It introduces a model to identify dataset mentions in publications and provides a cross-field analysis of data-sharing behaviors and challenges.
Findings
Data release is most common, especially in Commerce and Creative Arts.
Higher data reuse rates are observed in Biological and Agricultural Sciences.
Dataset referencing remains low across disciplines.
Abstract
Data sharing is fundamental to scientific progress, enhancing transparency, reproducibility, and innovation across disciplines. Despite its growing significance, the variability of data-sharing practices across research fields remains insufficiently understood, limiting the development of effective policies and infrastructure. This study investigates the evolving landscape of data-sharing practices, specifically focusing on the intentions behind data release, reuse, and referencing. Leveraging the PubMed open dataset, we developed a model to identify mentions of datasets in the full-text of publications. Our analysis reveals that data release is the most prevalent sharing mode, particularly in fields such as Commerce, Management, and the Creative Arts. In contrast, STEM fields, especially the Biological and Agricultural Sciences, show significantly higher rates of data reuse. However,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
