Attention Meets Perturbations: Robust and Interpretable Attention with   Adversarial Training

Shunsuke Kitada; Hitoshi Iyatomi

arXiv:2009.12064·cs.CL·November 23, 2022

Attention Meets Perturbations: Robust and Interpretable Attention with Adversarial Training

Shunsuke Kitada, Hitoshi Iyatomi

PDF

1 Repo

TL;DR

This paper introduces adversarial training techniques for attention mechanisms in NLP, significantly improving robustness and interpretability across multiple datasets.

Contribution

It proposes Attention AT and Attention iAT, novel adversarial training methods that enhance attention robustness and interpretability in NLP models.

Findings

01

Attention iAT outperformed in 9 of 10 tasks

02

Attention attention correlated more with word importance

03

Techniques are less sensitive to perturbation size

Abstract

Although attention mechanisms have been applied to a variety of deep learning models and have been shown to improve the prediction performance, it has been reported to be vulnerable to perturbations to the mechanism. To overcome the vulnerability to perturbations in the mechanism, we are inspired by adversarial training (AT), which is a powerful regularization technique for enhancing the robustness of the models. In this paper, we propose a general training technique for natural language processing tasks, including AT for attention (Attention AT) and more interpretable AT for attention (Attention iAT). The proposed techniques improved the prediction performance and the model interpretability by exploiting the mechanisms with AT. In particular, Attention iAT boosts those advantages by introducing adversarial perturbation, which enhances the difference in the attention of the sentences.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shunk031/attention-meets-perturbation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability