Can Rationalization Improve Robustness?
Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen

TL;DR
This paper investigates whether neural rationale models can enhance robustness against adversarial attacks in NLP by generating explanations that potentially ignore noise, revealing mixed results and highlighting areas for improvement.
Contribution
It systematically evaluates the robustness of rationale models against adversarial attacks across multiple NLP tasks, exploring the impact of interpretability on model resilience.
Findings
Rationale models can improve robustness but are sensitive to positional bias.
They struggle when adversarial text exploits lexical choices.
Using human rationales does not always enhance robustness.
Abstract
A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
