R&R: Metric-guided Adversarial Sentence Generation

Lei Xu; Alfredo Cuesta-Infante; Laure Berti-Equille; Kalyan; Veeramachaneni

arXiv:2104.08453·cs.CL·October 21, 2022·1 cites

R&R: Metric-guided Adversarial Sentence Generation

Lei Xu, Alfredo Cuesta-Infante, Laure Berti-Equille, Kalyan, Veeramachaneni

PDF

Open Access 1 Repo

TL;DR

This paper introduces R&R, a framework for generating high-quality adversarial sentences that are fluent, similar to original sentences, and effectively fool classifiers, improving robustness analysis of text models.

Contribution

The paper proposes a novel rewrite and rollback framework that optimizes a combined critique score to generate superior adversarial examples, balancing fluency, similarity, and misclassification.

Findings

01

Outperforms state-of-the-art in attack success rate by up to 16.2%

02

Effective across 5 datasets and 3 classifier architectures

03

Enhances the quality of adversarial examples in NLP tasks

Abstract

Adversarial examples are helpful for analyzing and improving the robustness of text classifiers. Generating high-quality adversarial examples is a challenging task as it requires generating fluent adversarial sentences that are semantically similar to the original sentences and preserve the original labels, while causing the classifier to misclassify them. Existing methods prioritize misclassification by maximizing each perturbation's effectiveness at misleading a text classifier; thus, the generated adversarial examples fall short in terms of fluency and similarity. In this paper, we propose a rewrite and rollback (R&R) framework for adversarial attack. It improves the quality of adversarial examples by optimizing a critique score which combines the fluency, similarity, and misclassification metrics. R&R generates high-quality adversarial examples by allowing exploration of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DAI-Lab/fibber
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection