DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Xianjun Yang, Wei Cheng, Yue Wu, Linda Petzold, William Yang Wang,, Haifeng Chen

TL;DR
DNA-GPT introduces a training-free, zero-shot method for detecting GPT-generated text by analyzing discrepancies in N-gram distributions after regenerating text segments, outperforming existing classifiers.
Contribution
The paper presents a novel training-free detection strategy that leverages divergence analysis of regenerated text segments to distinguish human from machine-generated content.
Findings
State-of-the-art detection accuracy on multiple datasets.
Outperforms OpenAI's classifier trained on large datasets.
Provides explainable evidence supporting detection decisions.
Abstract
Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we unveil significant discrepancies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Transformer
