The Use of Scientific Data: A Content Analysis
Jian Zhang, Chaomei Chen, Michael S. Vogeley

TL;DR
This paper analyzes how researchers use scientific data in the data-intensive paradigm, revealing patterns, challenges, and the dual role of users as data consumers and producers through content analysis of SDSS publications.
Contribution
It provides an empirical analysis of data usage practices in scientific research, highlighting the diversity and limitations in data source utilization.
Findings
Nearly half of studies used only one data source.
Users are both consumers and producers of data.
Limited use of multiple large-scale data sources due to trust and usability issues.
Abstract
Nowadays, science has been coming into a new paradigm, called data-intensive science. While current studies of the new phenomenon focused on building up infrastructure for this new paradigm, yet a few studies concern users of scientific data, particularly their usage practices in the newly emerging paradigm, even though the importance of understanding users' work flow and practices has been summoned. This study endeavors to improve our understanding of users' data usage behavior through a content analysis of publications in a frequently cited new paradigm-related project, Sloan Digital Sky Survey (SDSS). We found that (1) nearly half studies used one data source only. A few studies exploited three or more data sources; (2) the number of objects that were analyzed in SDSS publications is in all scales from one digit to millions; (3) different paper types may affect the data usage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Data Mining Algorithms and Applications
