Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text
Tom Kempton, Viktor Drobnyi, Maeve Madigan, Stuart Burrell

TL;DR
This paper identifies a flaw in likelihood-based detectors for machine-generated text caused by Simpson's paradox and introduces a Bayesian calibration method that significantly improves detection accuracy.
Contribution
It provides a diagnosis of the causes behind detector under-performance and proposes a modular, Bayesian calibration approach to enhance detection across various models and datasets.
Findings
Calibrated detectors achieve higher AUROC scores, e.g., from 0.63 to 0.85 on GPT-5.4.
The proposed method improves detection performance across all baseline detectors and datasets.
The diagnosis and calibration approach are compatible with any token-averaging detection pipeline.
Abstract
The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear more probable to a detector language model than human-written text. However, we demonstrate that the token-level signal distinguishing human and machine text is non-uniform across the hidden space of the detector model, and naively averaging likelihood-based token scores across regions with fundamentally different statistical structure, as most detectors do, causes a form of Simpson's paradox: a strong local signal is destroyed by inappropriate aggregation. To correct for this, we introduce a learned local calibration step grounded in Bayesian decision theory. Rather than aggregating raw token scores, we first learn lightweight predictors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
