How Reliable Are AI-Generated-Text Detectors? An Assessment Framework   Using Evasive Soft Prompts

Tharindu Kumarage; Paras Sheth; Raha Moraffah; Joshua Garland; Huan; Liu

arXiv:2310.05095·cs.CL·October 10, 2023·1 cites

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

Tharindu Kumarage, Paras Sheth, Raha Moraffah, Joshua Garland, Huan, Liu

PDF

Open Access

TL;DR

This paper introduces a universal soft prompt technique that guides language models to generate human-like text capable of evading existing AI-generated text detectors, revealing limitations in current detection methods.

Contribution

The study proposes a novel universal evasive soft prompt approach that can be transferred across models to effectively bypass high-performing AI-generated text detectors.

Findings

01

Evasive soft prompts significantly reduce detector accuracy.

02

Transferability of prompts across models is effective.

03

Detectors are vulnerable to soft prompt-based evasion.

Abstract

In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education