Effective and Imperceptible Adversarial Textual Attack via Multi-objectivization
Shengcai Liu, Ning Lu, Wenjing Hong, Chao Qian, Ke Tang

TL;DR
This paper introduces HydraText, a multi-objective evolutionary algorithm that crafts adversarial text attacks balancing success and imperceptibility, outperforming existing methods in effectiveness and subtlety.
Contribution
It reformulates adversarial text attack as a multi-objective optimization problem and proposes HydraText, the first approach effective in both score-based and decision-based attack settings.
Findings
HydraText achieves high attack success rates.
AEs are more indistinguishable from human text.
Adversarial training with HydraText improves model robustness.
Abstract
The field of adversarial textual attack has significantly grown over the last few years, where the commonly considered objective is to craft adversarial examples (AEs) that can successfully fool the target model. However, the imperceptibility of attacks, which is also essential for practical attackers, is often left out by previous studies. In consequence, the crafted AEs tend to have obvious structural and semantic differences from the original human-written text, making them easily perceptible. In this work, we advocate leveraging multi-objectivization to address such issue. Specifically, we reformulate the problem of crafting AEs as a multi-objective optimization problem, where the attack imperceptibility is considered as an auxiliary objective. Then, we propose a simple yet effective evolutionary algorithm, dubbed HydraText, to solve this problem. To the best of our knowledge,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Hate Speech and Cyberbullying Detection
