Mutation-Based Adversarial Attacks on Neural Text Detectors
Gongbo Liang, Jesus Guerrero, Izzat Alsmadi

TL;DR
This paper introduces mutation-based adversarial attack methods targeting neural text detectors, using character and word mutations to generate samples that confuse classifiers and reduce detection accuracy.
Contribution
It proposes novel mutation operators inspired by software testing to craft white-box adversarial samples for neural text detection models.
Findings
Mutation operators effectively decrease detector accuracy
Adversarial samples significantly confuse neural classifiers
Method demonstrates robustness against state-of-the-art detectors
Abstract
Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling
