MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation

Zhou Yang; Shunyan Luo; Jiazhen Zhu; Fang Jin

arXiv:2512.04386·cs.CL·December 5, 2025

MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation

Zhou Yang, Shunyan Luo, Jiazhen Zhu, Fang Jin

PDF

Open Access

TL;DR

MASE is a novel, model-agnostic framework for interpreting NLP models by estimating input saliency through perturbations on embeddings, providing local explanations without requiring internal model details.

Contribution

Introduces MASE, a new interpretability method for NLP models that uses perturbations on embeddings to generate local explanations without needing model internals.

Findings

01

MASE outperforms existing model-agnostic interpretation methods.

02

It effectively estimates input saliency in NLP models.

03

MASE improves Delta Accuracy in explanations.

Abstract

Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decision-making processes. Traditional methods often rely on post-hoc interpretations, such as saliency maps or feature visualization, which might not be directly applicable to the discrete nature of word data in NLP. Addressing this, we introduce the Model-agnostic Saliency Estimation (MASE) framework. MASE offers local explanations for text-based predictive models without necessitating in-depth knowledge of a model's internal architecture. By leveraging Normalized Linear Gaussian Perturbations (NLGP) on the embedding layer instead of raw word inputs, MASE efficiently estimates input saliency. Our results indicate MASE's superiority over other model-agnostic interpretation methods, especially in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning