Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Weixin Liang; Zachary Izzo; Yaohui Zhang; Haley Lepp; Hancheng Cao; Xuandong Zhao; Lingjiao Chen; Haotian Ye; Sheng Liu; Zhi Huang; Daniel A. McFarland; James Y. Zou

arXiv:2403.07183·cs.CL·May 20, 2026·65 cites

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

PDF

1 Repo 2 Videos

TL;DR

This study introduces a maximum likelihood approach to estimate the proportion of peer review texts substantially modified or generated by large language models in major AI conferences post-ChatGPT release.

Contribution

It provides a scalable method to detect LLM-influenced content at the corpus level and applies it to real-world peer review data, revealing significant LLM usage patterns.

Findings

01

6.5% to 16.9% of reviews may be LLM-modified

02

Higher LLM use in reviews with lower confidence and near deadlines

03

Corpus-level trends in generated text suggest subtle shifts in peer review practices

Abstract

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Weixin-Liang/Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers
noneOfficial

Videos

5 Key Quotes: Altman, Huang and 'The Most Interesting Year'· youtube

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews· slideslive

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education