Segmenting Human-LLM Co-authored Text via Change Point Detection
Mengchu Li, Jin Zhu, Jinglai Li, Chengchun Shi

TL;DR
This paper introduces change point detection algorithms to segment text into human- and LLM-authored parts, addressing the limitations of existing binary classifiers for mixed texts.
Contribution
It adapts classical change point detection methods to the problem of localizing human and LLM segments in co-authored text, establishing their optimality and demonstrating superior performance.
Findings
Algorithms effectively distinguish human and LLM segments.
Proposed methods outperform existing baselines.
The approach is theoretically grounded with minimax optimality.
Abstract
The rise of large language models (LLMs) has created an urgent need to distinguish between human-written and LLM-generated text to ensure authenticity and societal trust. Existing detectors typically provide a binary classification for an entire passage; however, this is insufficient for human--LLM co-authored text, where the objective is to localize specific segments authored by humans or LLMs. To bridge this gap, we propose algorithms to segment text into human- and LLM-authored pieces. Our key observation is that such a segmentation task is conceptually similar to classical change point detection in time-series analysis. Leveraging this analogy, we adapt change point detection to LLM-generated text detection, develop a weighted algorithm and a generalized algorithm to accommodate heterogeneous detection score variability, and establish the minimax optimality of our procedure.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
