Model Interpretability and Rationale Extraction by Input Mask Optimization

Marc Brinner; Sina Zarriess

arXiv:2508.11388·cs.CL·August 18, 2025

Model Interpretability and Rationale Extraction by Input Mask Optimization

Marc Brinner, Sina Zarriess

PDF

TL;DR

This paper introduces a gradient-based input masking method to generate explanations for neural network predictions, applicable to both text and images, without needing specialized models, thus enhancing interpretability.

Contribution

It presents a novel, model-agnostic approach for rationale extraction that enforces properties like sufficiency, comprehensiveness, and compactness through regularization.

Findings

01

Effective explanations for NLP and image models

02

No need for training dedicated explanation models

03

High-quality rationale extraction demonstrated

Abstract

Concurrent to the rapid progress in the development of neural-network based models in areas like natural language processing and computer vision, the need for creating explanations for the predictions of these black-box models has risen steadily. We propose a new method to generate extractive explanations for predictions made by neural networks, that is based on masking parts of the input which the model does not consider to be indicative of the respective class. The masking is done using gradient-based optimization combined with a new regularization scheme that enforces sufficiency, comprehensiveness and compactness of the generated explanation, three properties that are known to be desirable from the related field of rationale extraction in natural language processing. In this way, we bridge the gap between model interpretability and rationale extraction, thereby proving that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.