Unifying Model Explainability and Robustness for Joint Text   Classification and Rationale Extraction

Dongfang Li; Baotian Hu; Qingcai Chen; Tujie Xu; Jingcong Tao; Yunan; Zhang

arXiv:2112.10424·cs.CL·December 21, 2021

Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction

Dongfang Li, Baotian Hu, Qingcai Chen, Tujie Xu, Jingcong Tao, Yunan, Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AT-BMC, a joint model for text classification and rationale extraction that enhances robustness against adversarial attacks and improves explanation quality by combining adversarial training and boundary-guided rationale localization.

Contribution

It presents a novel joint model that unifies explainability and robustness, leveraging mixed adversarial training and boundary match constraints for improved performance.

Findings

01

Outperforms baselines in classification and rationale extraction.

02

Reduces attack success rate by up to 69%.

03

Shows a connection between robustness and better explanations.

Abstract

Recent works have shown explainability and robustness are two crucial ingredients of trustworthy and reliable text classification. However, previous works usually address one of two aspects: i) how to extract accurate rationales for explainability while being beneficial to prediction; ii) how to make the predictive model robust to different types of adversarial attacks. Intuitively, a model that produces helpful explanations should be more robust against adversarial attacks, because we cannot trust the model that outputs explanations but changes its prediction under small perturbations. To this end, we propose a joint classification and rationale extraction model named AT-BMC. It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crazyofapple/at-bmc
pytorchOfficial

Videos

Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning