Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?
Yulin Yu, Daniel M. Romero

TL;DR
This study investigates whether unusual combinations of datasets in social science research lead to higher scientific impact, finding that infrequent dataset pairings are associated with increased citations and broader dissemination.
Contribution
It provides empirical evidence that combining rarely paired datasets enhances scientific and societal impact, highlighting an under-utilized strategy for research innovation.
Findings
Infrequently paired datasets significantly boost impact and dissemination.
Dataset topic atypicality has a smaller effect on citations.
Less experienced teams more frequently use unusual dataset combinations.
Abstract
Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement. In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. Focusing on social science, we investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 5,000 datasets curated within one of the largest social science databases, ICPSR. This study offers four important insights. First, combining datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R
