Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness
Shixuan Ma, Quan Wang

TL;DR
This paper introduces token cohesiveness as a new feature for zero-shot detection of LLM-generated text, demonstrating its effectiveness in improving detection accuracy across various models and datasets.
Contribution
The paper proposes TOCSIN, a novel dual-channel detection method that leverages token cohesiveness to enhance zero-shot detection of LLM-generated text in black-box settings.
Findings
Token cohesiveness is higher in LLM-generated text than in human-written text.
TOCSIN improves detection performance when integrated with existing zero-shot detectors.
The approach is effective across multiple datasets, models, and evaluation scenarios.
Abstract
The increasing capability and widespread usage of large language models (LLMs) highlight the desirability of automatic detection of LLM-generated text. Zero-shot detectors, due to their training-free nature, have received considerable attention and notable success. In this paper, we identify a new feature, token cohesiveness, that is useful for zero-shot detection, and we demonstrate that LLM-generated text tends to exhibit higher token cohesiveness than human-written text. Based on this observation, we devise TOCSIN, a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors. To calculate token cohesiveness, TOCSIN only requires a few rounds of random token deletion and semantic difference measurement, making it particularly suitable for a practical black-box setting where the source model used for generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsSoftmax · Attention Is All You Need · Balanced Selection
