Robust Infidelity: When Faithfulness Measures on Masked Language Models   Are Misleading

Evan Crothers; Herna Viktor; Nathalie Japkowicz

arXiv:2308.06795·cs.CL·June 4, 2024

Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading

Evan Crothers, Herna Viktor, Nathalie Japkowicz

PDF

Open Access

TL;DR

This paper critically examines the use of faithfulness metrics based on iterative masking for neural text classifier interpretability, revealing their limitations and potential pitfalls in accurately assessing model explanations.

Contribution

It introduces the concept of 'sensitivity to iterative masking' and highlights issues with current faithfulness measures, offering guidance for more reliable interpretability evaluation.

Findings

01

Iterative masking scores vary significantly between similar models.

02

Masked samples often fall outside training data distribution.

03

Sensitivity to masking can resemble adversarial attacks.

Abstract

A common approach to quantifying neural text classifier interpretability is to calculate faithfulness metrics based on iteratively masking salient input tokens and measuring changes in the model prediction. We propose that this property is better described as "sensitivity to iterative masking", and highlight pitfalls in using this measure for comparing text classifier interpretability. We show that iterative masking produces large variation in faithfulness scores between otherwise comparable Transformer encoder text classifiers. We then demonstrate that iteratively masked samples produce embeddings outside the distribution seen during training, resulting in unpredictable behaviour. We further explore task-specific considerations that undermine principled comparison of interpretability using iterative masking, such as an underlying similarity to salience-based adversarial attacks. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention