PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints using Literature Databases
Louise Bloch, Johannes R\"uckert, and Christoph M. Friedrich

TL;DR
PreprintResolver is a tool that uses multiple literature databases to identify published versions of arXiv preprints, improving citation accuracy and helping users distinguish peer-reviewed research from preprints.
Contribution
This paper introduces PreprintResolver, a novel method combining four literature databases and fuzzy matching to resolve preprint-publication pairs for arXiv preprints.
Findings
Resolved 60.3% of sampled preprints without publication info
All four databases contributed to successful matches
Manual validation shows high plausibility of results
Abstract
The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef / CrossCite) to identify preprint-publication pairs for the arXiv preprint server. The target audience focuses on, but is not limited to inexperienced researchers and students, especially from the field of computer science. The tool is based on a fuzzy matching of author surnames, titles, and DOIs. Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic Publishing and Open Access · Research Data Management Practices · scientometrics and bibliometrics research
