Differences between preprints and journal articles : Trial using bioRxiv data
Koshiba Hitoshi, HayashiI Kazuhiro

TL;DR
This study compares preprints and journal articles using bioRxiv data, finding minimal differences in external criteria and low classification accuracy, confirming previous results with larger, recent datasets.
Contribution
It demonstrates the technical feasibility of comparing preprints and journal articles and provides new evidence that differences are small, even with recent data and machine learning methods.
Findings
No significant difference in external criteria between preprints and journal articles.
Machine learning classification accuracy around 47%.
Differences between preprints that become journal articles and those that do not are small.
Abstract
In this paper, we attempted to obtain knowledge about how research is conducted, especially how journal articles are produced, by comparing preprints with journal articles that are finally published. First, due to the recent trend of open journals, we were able to secure a certain amount of full-text XML of preprints and journal articles, and verified the technical feasibility of comparing preprints and journal articles. On the other hand, within the scope of this trial, in which we tried to clarify the difference between them based on external criteria such as the number of references and the number of words, and simple document similarity, we could not find a clear difference between preprints and journal articles, or between preprints that became journal articles and those that did not. Even with the machine learning method, the classification accuracy was not high at about 47%.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic Publishing and Open Access · Biomedical Text Mining and Ontologies · Research Data Management Practices
