Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Yihuai Xu, Yongwei Wang, Yifei Bi, Huangsen Cao, Zhouhan Lin, Yu Zhao,, Fei Wu

TL;DR
This paper introduces Lastde, a training-free method that combines local and global token probability sequence analysis, including time series techniques, to effectively detect LLM-generated texts across various scenarios with high robustness.
Contribution
The paper presents Lastde, a novel training-free detector that integrates local and global statistical features, including temporal dynamics, for improved LLM-generated text detection.
Findings
Achieves state-of-the-art detection accuracy across multiple datasets.
Demonstrates robustness against paraphrasing attacks.
Effective in cross-domain, cross-model, and cross-lingual scenarios.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains. However, the potential misuse of LLMs has raised significant concerns, underscoring the urgent need for reliable detection of LLM-generated texts. Conventional training-based detectors often struggle with generalization, particularly in cross-domain and cross-model scenarios. In contrast, training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability. Despite this, existing training-free detection methods typically rely on global text sequence statistics, neglecting the modeling of local discriminative features, thereby limiting their detection efficacy. In this work, we introduce a novel training-free detector, termed \textbf{Lastde} that synergizes local and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsFocus
