MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation
Zhou Yang, Shunyan Luo, Jiazhen Zhu, Fang Jin

TL;DR
MASE is a novel, model-agnostic framework for interpreting NLP models by estimating input saliency through perturbations on embeddings, providing local explanations without requiring internal model details.
Contribution
Introduces MASE, a new interpretability method for NLP models that uses perturbations on embeddings to generate local explanations without needing model internals.
Findings
MASE outperforms existing model-agnostic interpretation methods.
It effectively estimates input saliency in NLP models.
MASE improves Delta Accuracy in explanations.
Abstract
Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decision-making processes. Traditional methods often rely on post-hoc interpretations, such as saliency maps or feature visualization, which might not be directly applicable to the discrete nature of word data in NLP. Addressing this, we introduce the Model-agnostic Saliency Estimation (MASE) framework. MASE offers local explanations for text-based predictive models without necessitating in-depth knowledge of a model's internal architecture. By leveraging Normalized Linear Gaussian Perturbations (NLGP) on the embedding layer instead of raw word inputs, MASE efficiently estimates input saliency. Our results indicate MASE's superiority over other model-agnostic interpretation methods, especially in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
