RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

Xin Chen; Junchao Wu; Shu Yang; Runzhe Zhan; Zeyu Wu; Ziyang Luo; Di Wang; Min Yang; Lidia S. Chao; Derek F. Wong

arXiv:2508.13152·cs.CL·August 19, 2025

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao, Derek F. Wong

PDF

Open Access 1 Video

TL;DR

RepreGuard is a novel detection method that leverages internal neural activation patterns of LLMs to distinguish between machine-generated and human-written texts, showing high accuracy and robustness across various scenarios.

Contribution

This paper introduces RepreGuard, a new statistics-based detection approach utilizing internal LLM representations, improving robustness over existing methods in diverse conditions.

Findings

01

Achieves 94.92% AUROC on both ID and OOD data

02

Demonstrates robustness against various text sizes and attacks

03

Outperforms all baseline detection methods

Abstract

Detecting content generated by large language models (LLMs) is crucial for preventing misuse and building trustworthy AI systems. Although existing detection methods perform well, their robustness in out-of-distribution (OOD) scenarios is still lacking. In this paper, we hypothesize that, compared to features used by existing detection methods, the internal representations of LLMs contain more comprehensive and raw features that can more effectively capture and distinguish the statistical pattern differences between LLM-generated texts (LGT) and human-written texts (HWT). We validated this hypothesis across different LLMs and observed significant differences in neural activation patterns when processing these two types of texts. Based on this, we propose RepreGuard, an efficient statistics-based detection method. Specifically, we first employ a surrogate model to collect representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns· underline

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing