Detecting Machine-Generated Long-Form Content with Latent-Space Variables
Yufei Tian, Zeyu Pan, Nanyun Peng

TL;DR
This paper introduces a robust detection method for machine-generated long-form content by analyzing event transitions in latent space, outperforming existing token-based detectors and revealing inherent differences in event generation between humans and LLMs.
Contribution
The paper presents a novel latent-space model that leverages event transitions to distinguish machine from human texts, improving detection accuracy under domain shifts.
Findings
31% improvement over DetectGPT in distinguishing texts
Modern LLMs generate event triggers differently from humans
Latent-space analysis reveals inherent differences in event transitions
Abstract
The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing machine-generated outputs from human-written ones, which is crucial for ensuring authenticity and trustworthiness of expressions. Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts, including different prompting and decoding strategies, and adversarial attacks. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts by training a latent-space model on sequences of events or topics derived from human-written texts. In three different domains, machine-generated texts, which are originally inseparable from human texts on the token level, can be better distinguished with our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques
MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
