Reproducibility of COVID-19 pre-prints
Annie Collins, Rohan Alexander

TL;DR
This study assesses the reproducibility of COVID-19 research pre-prints by analyzing data and code availability markers across major pre-print servers, revealing low levels of open data and code sharing.
Contribution
It provides a systematic analysis of data and code availability in COVID-19 pre-prints, highlighting reproducibility challenges during the pandemic.
Findings
75% of arXiv pre-prints lack open data or code
67% of bioRxiv pre-prints lack open data or code
79% of medRxiv pre-prints lack open data or code
Abstract
To examine the reproducibility of COVID-19 research, we create a dataset of pre-prints posted to arXiv, bioRxiv, and medRxiv between 28 January 2020 and 30 June 2021 that are related to COVID-19. We extract the text from these pre-prints and parse them looking for keyword markers signaling the availability of the data and code underpinning the pre-print. For the pre-prints that are in our sample, we are unable to find markers of either open data or open code for 75 per cent of those on arXiv, 67 per cent of those on bioRxiv, and 79 per cent of those on medRxiv.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Academic Publishing and Open Access
