Cracking Double-Blind Review: Authorship Attribution with Deep Learning
Leonard Bauersfeld, Angel Romero, Manasi Muglikar, Davide, Scaramuzza

TL;DR
This paper introduces a transformer-based neural network that accurately attributes authorship of anonymous research papers using only text content and bibliography, revealing potential biases in double-blind peer review.
Contribution
It presents the largest authorship identification dataset and a novel deep learning method achieving high accuracy in authorship attribution, with analysis on scalability and key attribution factors.
Findings
Achieves up to 73% accuracy on datasets with 2,000 authors
Demonstrates scalability to larger datasets with sufficient compute
Provides insights into key features influencing authorship attribution
Abstract
Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses the text content and the author names in the bibliography to attribute an anonymous manuscript to an author. To train and evaluate our method, we created the largest authorship identification dataset to date. It leverages all research papers publicly available on arXiv amounting to over 2 million manuscripts. In arXiv-subsets with up to 2,000 different authors, our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly. We present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Biomedical Text Mining and Ontologies
