Double Perturbation: On the Robustness of Robustness and Counterfactual   Bias Evaluation

Chong Zhang; Jieyu Zhao; Huan Zhang; Kai-Wei Chang; Cho-Jui Hsieh

arXiv:2104.05232·cs.CL·April 13, 2021·1 cites

Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, Cho-Jui Hsieh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a double perturbation framework to evaluate the robustness and counterfactual bias of NLP models by testing their stability against slight dataset perturbations and single-word substitutions.

Contribution

It proposes a novel double perturbation method to uncover model weaknesses and hidden biases beyond standard test datasets in NLP.

Findings

01

High success rates (96.0%-99.8%) in identifying vulnerable examples.

02

Reveals hidden model biases not evident in original test data.

03

Effective in testing robustness and bias in CNNs and Transformers.

Abstract

Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chong-z/nlp-second-order-attack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques