DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D., Manning, Chelsea Finn

TL;DR
DetectGPT is a zero-shot method that identifies machine-generated text by analyzing the negative curvature regions of an LLM's log probability function, improving detection accuracy without training classifiers.
Contribution
The paper introduces a novel curvature-based criterion for detecting LLM-generated text that does not require training data or watermarking, leveraging the structure of the model's probability function.
Findings
DetectGPT outperforms existing zero-shot detection methods.
It achieves 0.95 AUROC in identifying GPT-NeoX generated fake news.
The approach is effective without additional training or data collection.
Abstract
The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
MethodsGPT-NeoX
