Paraphrasing evades detectors of AI-generated text, but retrieval is an   effective defense

Kalpesh Krishna; Yixiao Song; Marzena Karpinska; John Wieting; Mohit; Iyyer

arXiv:2303.13408·cs.CL·October 19, 2023·89 cites

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit, Iyyer

PDF

Open Access 1 Repo 7 Models 2 Datasets 1 Video

TL;DR

Paraphrasing AI-generated text can effectively evade detection methods, but a retrieval-based approach can robustly identify such paraphrases, enhancing detection resilience.

Contribution

We develop DIPPER, a paraphrase model that evades detectors, and propose a retrieval-based defense that significantly improves detection robustness against paraphrasing attacks.

Findings

01

DIPPER reduces detection accuracy of several detectors from over 70% to below 5%.

02

Retrieval-based defense detects 80-97% of paraphrased texts with minimal false positives.

03

Our models, code, and data are open-sourced.

Abstract

The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martiansideofthemoon/ai-detection-paraphrases
pytorchOfficial

Models

Datasets

Videos

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense· slideslive

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Authorship Attribution and Profiling