Training-free LLM-generated Text Detection by Mining Token Probability   Sequences

Yihuai Xu; Yongwei Wang; Yifei Bi; Huangsen Cao; Zhouhan Lin; Yu Zhao,; Fei Wu

arXiv:2410.06072·cs.CL·October 10, 2024

Training-free LLM-generated Text Detection by Mining Token Probability Sequences

Yihuai Xu, Yongwei Wang, Yifei Bi, Huangsen Cao, Zhouhan Lin, Yu Zhao,, Fei Wu

PDF

Open Access 1 Video

TL;DR

This paper introduces Lastde, a training-free method that combines local and global token probability sequence analysis, including time series techniques, to effectively detect LLM-generated texts across various scenarios with high robustness.

Contribution

The paper presents Lastde, a novel training-free detector that integrates local and global statistical features, including temporal dynamics, for improved LLM-generated text detection.

Findings

01

Achieves state-of-the-art detection accuracy across multiple datasets.

02

Demonstrates robustness against paraphrasing attacks.

03

Effective in cross-domain, cross-model, and cross-lingual scenarios.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains. However, the potential misuse of LLMs has raised significant concerns, underscoring the urgent need for reliable detection of LLM-generated texts. Conventional training-based detectors often struggle with generalization, particularly in cross-domain and cross-model scenarios. In contrast, training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability. Despite this, existing training-free detection methods typically rely on global text sequence statistics, neglecting the modeling of local discriminative features, thereby limiting their detection efficacy. In this work, we introduce a novel training-free detector, termed \textbf{Lastde} that synergizes local and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Training-free LLM-generated Text Detection by Mining Token Probability Sequences· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsFocus