Comment on Scientific production in the era of large language models
Thomas Renault, Antonin Bergeaud, Cl\'ement Bosquet

TL;DR
This paper critically examines the methodology used to assess researchers' publication output after adopting large language models, revealing that observed increases may be spurious due to detection rules and statistical artifacts.
Contribution
It demonstrates that common event study approaches can produce misleading results when evaluating LLM adoption effects, highlighting the need for more robust analysis methods.
Findings
Detection-based adoption timing is mechanically related to output levels.
Placebo exercises produce similar positive post-treatment patterns.
Spurious effects can arise even with no true causal impact.
Abstract
Kusumegi et al. (2025) study whether researchers' preprint output rises after adopting large language models (LLMs), dating adoption as the first month in which at least one submitted abstract exceeds an LLM-detection threshold. We show that this treatment-timing rule is mechanically related to output. The probability that at least one paper is flagged in a month is increasing in the number of papers submitted in that month, so detected-adoption months are disproportionately high-output months. An event study centered on first detection can therefore display positive post-event dynamics even when the flagging rule contains no information about true LLM adoption, because the omitted pre-treatment period is selected from months with no prior detection. We demonstrate this in a simulation: with i.i.d. productivity and no causal effect, first-detection timing generates a spurious positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
