Humanizing Machine-Generated Content: Evading AI-Text Detection through   Adversarial Attack

Ying Zhou; Ben He; Le Sun

arXiv:2404.01907·cs.CL·April 3, 2024·3 cites

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

Ying Zhou, Ben He, Le Sun

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that current AI-text detection models are vulnerable to adversarial attacks that can quickly and easily evade detection, highlighting the need for more robust detection methods.

Contribution

The paper introduces a framework for adversarial attacks on AI-text detectors and evaluates their effectiveness, revealing vulnerabilities and exploring robustness improvements.

Findings

01

Detection models can be bypassed in as little as 10 seconds.

02

Adversarial learning can improve robustness but faces practical challenges.

03

Current detectors are significantly vulnerable to minor perturbations.

Abstract

With the development of large language models (LLMs), detecting whether text is generated by a machine becomes increasingly challenging in the face of malicious use cases like the spread of false information, protection of intellectual property, and prevention of academic plagiarism. While well-trained text detectors have demonstrated promising performance on unseen test data, recent research suggests that these detectors have vulnerabilities when dealing with adversarial attacks such as paraphrasing. In this paper, we propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness against such attacks. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhouying20/hmgc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques