Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood
Yang Xu, Yu Wang, Hao An, Zhichen Liu, Yongyuan Li

TL;DR
This paper introduces a novel approach using the spectrum of relative likelihoods to detect subtle differences between human and AI-generated texts, achieving state-of-the-art results and providing insights rooted in psycholinguistics.
Contribution
It presents a new detection method based on relative likelihood spectrum analysis, outperforming previous zero-shot techniques and revealing subtle language differences with theoretical backing.
Findings
Achieves state-of-the-art short-text detection performance
Provides a new perspective using relative likelihood spectrum analysis
Reveals subtle linguistic differences rooted in psycholinguistics
Abstract
Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies. Our code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFractal and DNA sequence analysis
