Variation is the Key: A Variation-Based Framework for LLM-Generated Text Detection
Xuecong Li, Xiaohong Li, Qiang Hu, Yao Zhang, Junjie Wang

TL;DR
This paper introduces VaryBalance, a practical detection method that leverages differences between original and LLM-rewritten texts to effectively distinguish human from AI-generated content, outperforming existing detectors.
Contribution
The paper presents a novel variation-based framework for LLM-generated text detection that is simple, effective, and robust across models and languages.
Findings
VaryBalance outperforms state-of-the-art detectors by up to 34.3% AUROC.
It maintains robustness across multiple LLMs and languages.
The method relies on quantifying differences between human texts and their LLM-rewritten versions.
Abstract
Detecting text generated by large language models (LLMs) is crucial but challenging. Existing detectors depend on impractical assumptions, such as white-box settings, or solely rely on text-level features, leading to imprecise detection ability. In this paper, we propose a simple but effective and practical LLM-generated text detection method, VaryBalance. The core of VaryBalance is that, compared to LLM-generated texts, there is a greater difference between human texts and their rewritten version via LLMs. Leveraging this observation, VaryBalance quantifies this through mean standard deviation and distinguishes human texts and LLM-generated texts. Comprehensive experiments demonstrated that VaryBalance outperforms the state-of-the-art detectors, i.e., Binoculars, by up to 34.3\% in terms of AUROC, and maintains robustness against multiple generating models and languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Natural Language Processing Techniques
