Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection
Pablo Miralles-Gonz\'alez, Javier Huertas-Tato, Alejandro Mart\'in, David Camacho

TL;DR
This paper introduces PAWN, a novel method for detecting AI-generated text that weights token predictability differently, improving accuracy across domains, models, and languages while reducing training resources.
Contribution
The paper proposes PAWN, a weighted network leveraging last hidden states and next-token distribution metrics, enhancing detection performance and robustness over existing zero-shot and fine-tuned methods.
Findings
PAWN outperforms fine-tuned LMs in in-distribution detection.
PAWN generalizes better to unseen domains and models.
PAWN is more robust to adversarial attacks and multilingual scenarios.
Abstract
The rapid advancement in large language models (LLMs) has significantly enhanced their ability to generate coherent and contextually relevant text, raising concerns about the misuse of AI-generated content and making it critical to detect it. However, the task remains challenging, particularly in unseen domains or with unfamiliar LLMs. Leveraging LLM next-token distribution outputs offers a theoretically appealing approach for detection, as they encapsulate insights from the models' extensive pre-training on diverse corpora. Despite its promise, zero-shot methods that attempt to operationalize these outputs have met with limited success. We hypothesize that one of the problems is that they use the mean to aggregate next-token distribution metrics across tokens, when some tokens are naturally easier or harder to predict and should be weighted differently. Based on this idea, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
