LLM-Generated or Human-Written? Comparing Review and Non-Review Papers on ArXiv
Yanai Elazar, Maria Antoniak

TL;DR
This study quantifies the prevalence of LLM-generated content in arXiv papers, revealing a significant increase especially in review papers, and discusses the implications of arXiv's policy to ban unpublished review submissions.
Contribution
It provides the first quantitative analysis of LLM-generated content in arXiv papers, comparing review and non-review categories with two detection methods.
Findings
LLM-generated content has increased substantially in recent years.
Higher prevalence of LLM-generated content in review papers.
Non-review LLM-generated papers are nearly six times more numerous.
Abstract
ArXiv recently prohibited the upload of unpublished review papers to its servers in the Computer Science domain, citing a high prevalence of LLM-generated content in these categories. However, this decision was not accompanied by quantitative evidence. In this work, we investigate this claim by measuring the proportion of LLM-generated content in review vs. non-review research papers in recent years. Using two high-quality detection methods, we find a substantial increase in LLM-generated content across both review and non-review papers, with a higher prevalence in review papers. However, when considering the number of LLM-generated papers published in each category, the estimates of non-review LLM-generated papers are almost six times higher. Furthermore, we find that this policy will affect papers in certain domains far more than others, with the CS subdiscipline Computers & Society…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Academic Publishing and Open Access · Scientific Computing and Data Management
