How to identify the roots of broad research topics and fields? The introduction of RPYS sampling using the example of climate change research
Robin Haunschild, Werner Marx, Andreas Thor, and Lutz Bornmann

TL;DR
This paper introduces sampling methods to extend the RPYS technique for identifying historical roots in broad research fields, demonstrated through climate change research, overcoming computational memory limitations.
Contribution
It presents sampling strategies and a scripting approach for RPYS, enabling analysis of larger datasets beyond previous memory constraints.
Findings
Systematic sampling yields the most accurate RPYS results.
Cluster sampling performs the worst among tested methods.
Random sampling provides good results but less accurate than systematic sampling.
Abstract
Since the introduction of the reference publication year spectroscopy (RPYS) method and the corresponding program CRExplorer, many studies have been published revealing the historical roots of topics, fields, and researchers. The application of the method was restricted up to now by the available memory of the computer used for running the CRExplorer. Thus, many users could not perform RPYS for broader research fields or topics. In this study, we present various sampling methods to solve this problem: random, systematic, and cluster sampling. We introduce the script language of the CRExplorer which can be used to draw many samples from the population dataset. Based on a large dataset of publications from climate change research, we compare RPYS results using population data with RPYS results using different sampling techniques. From our comparison with the full RPYS (population…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
