Mapping the Increasing Use of LLMs in Scientific Papers
Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji,, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang,, Christopher Potts, Christopher D Manning, James Y. Zou

TL;DR
This study systematically measures the increasing use of large language models in scientific papers across multiple disciplines, revealing significant growth especially in computer science, and correlating usage with research activity and paper length.
Contribution
First large-scale, corpus-level analysis quantifying LLM usage in scientific publications across various fields over time.
Findings
LLM usage in papers has steadily increased, especially in computer science.
Mathematics and Nature journals show lower LLM modification rates.
Higher LLM-modification correlates with frequent preprint posting, crowded research areas, and shorter papers.
Abstract
Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Artificial Intelligence in Law · Natural Language Processing Techniques
