Mapping the Increasing Use of LLMs in Scientific Papers

Weixin Liang; Yaohui Zhang; Zhengxuan Wu; Haley Lepp; Wenlong Ji,; Xuandong Zhao; Hancheng Cao; Sheng Liu; Siyu He; Zhi Huang; Diyi Yang,; Christopher Potts; Christopher D Manning; James Y. Zou

arXiv:2404.01268·cs.CL·April 2, 2024·38 cites

Mapping the Increasing Use of LLMs in Scientific Papers

Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji,, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang,, Christopher Potts, Christopher D Manning, James Y. Zou

PDF

Open Access 1 Repo

TL;DR

This study systematically measures the increasing use of large language models in scientific papers across multiple disciplines, revealing significant growth especially in computer science, and correlating usage with research activity and paper length.

Contribution

First large-scale, corpus-level analysis quantifying LLM usage in scientific publications across various fields over time.

Findings

01

LLM usage in papers has steadily increased, especially in computer science.

02

Mathematics and Nature journals show lower LLM modification rates.

03

Higher LLM-modification correlates with frequent preprint posting, crowded research areas, and shorter papers.

Abstract

Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Weixin-Liang/Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems · Artificial Intelligence in Law · Natural Language Processing Techniques