How many preprints have actually been printed and why: a case study of   computer science preprints on arXiv

Jialiang Lin; Yao Yu; Yu Zhou; Zhiyang Zhou; Xiaodong Shi

arXiv:2308.01899·cs.DL·August 4, 2023·Scientometrics

How many preprints have actually been printed and why: a case study of computer science preprints on arXiv

Jialiang Lin, Yao Yu, Yu Zhou, Zhiyang Zhou, Xiaodong Shi

PDF

TL;DR

This study analyzes the publication rate of computer science preprints on arXiv from 2008 to 2017, introduces a BERT-based mapping method to link preprints with final publications, and identifies factors influencing publication success.

Contribution

It presents a novel semantics-based BERT method for accurately matching preprints to published papers and provides insights into the characteristics of preprints that are eventually published.

Findings

01

66% of preprints are published with unchanged titles

02

11% are published under different titles with modifications

03

Published preprints have more revisions, authors, detailed abstracts, references, and source code

Abstract

Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.