Technical Report on the Pangram AI-Generated Text Classifier

Bradley Emi; Max Spero

arXiv:2402.14873·cs.CL·July 30, 2024·1 cites

Technical Report on the Pangram AI-Generated Text Classifier

Bradley Emi, Max Spero

PDF

Open Access

TL;DR

This paper introduces Pangram Text, a transformer-based AI text classifier that significantly outperforms existing tools across diverse domains and models, with low false positive rates and broad generalization.

Contribution

The paper presents a novel transformer-based classifier trained with hard negative mining, achieving superior accuracy and generalization in AI-generated text detection.

Findings

01

Outperforms zero-shot detection methods and commercial tools

02

Achieves over 38 times lower error rates on diverse benchmarks

03

Maintains low false positives and generalizes well to unseen domains and models

Abstract

We present Pangram Text, a transformer-based neural network trained to distinguish text written by large language models from text written by humans. Pangram Text outperforms zero-shot methods such as DetectGPT as well as leading commercial AI detection tools with over 38 times lower error rates on a comprehensive benchmark comprised of 10 text domains (student writing, creative writing, scientific writing, books, encyclopedias, news, email, scientific papers, short-form Q&A) and 8 open- and closed-source large language models. We propose a training algorithm, hard negative mining with synthetic mirrors, that enables our classifier to achieve orders of magnitude lower false positive rates on high-data domains such as reviews. Finally, we show that Pangram Text is not biased against nonnative English speakers and generalizes to domains and models unseen during training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies