Mutation-Based Adversarial Attacks on Neural Text Detectors

Gongbo Liang; Jesus Guerrero; Izzat Alsmadi

arXiv:2302.05794·cs.CR·February 14, 2023·1 cites

Mutation-Based Adversarial Attacks on Neural Text Detectors

Gongbo Liang, Jesus Guerrero, Izzat Alsmadi

PDF

Open Access

TL;DR

This paper introduces mutation-based adversarial attack methods targeting neural text detectors, using character and word mutations to generate samples that confuse classifiers and reduce detection accuracy.

Contribution

It proposes novel mutation operators inspired by software testing to craft white-box adversarial samples for neural text detection models.

Findings

01

Mutation operators effectively decrease detector accuracy

02

Adversarial samples significantly confuse neural classifiers

03

Method demonstrates robustness against state-of-the-art detectors

Abstract

Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling