How many preprints have actually been printed and why: a case study of computer science preprints on arXiv
Jialiang Lin, Yao Yu, Yu Zhou, Zhiyang Zhou, Xiaodong Shi

TL;DR
This study analyzes the publication rate of computer science preprints on arXiv from 2008 to 2017, introduces a BERT-based mapping method to link preprints with final publications, and identifies factors influencing publication success.
Contribution
It presents a novel semantics-based BERT method for accurately matching preprints to published papers and provides insights into the characteristics of preprints that are eventually published.
Findings
66% of preprints are published with unchanged titles
11% are published under different titles with modifications
Published preprints have more revisions, authors, detailed abstracts, references, and source code
Abstract
Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
