Empirical Characterization of Rationale Stability Under Controlled Perturbations for Explainable Pattern Recognition

Abu Noman Md Sakib; Zhensen Wang; Merjulah Roby; Zijie Zhang

arXiv:2604.04456·cs.AI·April 7, 2026

Empirical Characterization of Rationale Stability Under Controlled Perturbations for Explainable Pattern Recognition

Abu Noman Md Sakib, Zhensen Wang, Merjulah Roby, Zijie Zhang

PDF

1 Repo

TL;DR

This paper introduces a new metric to evaluate the stability of explanations in AI models, specifically assessing whether attribution patterns remain consistent across similar inputs, thereby improving trustworthiness.

Contribution

It proposes a novel explanation consistency metric using cosine similarity of SHAP values and demonstrates its effectiveness on transformer-based sentiment analysis models.

Findings

01

The metric can identify inconsistent model explanations effectively.

02

Experiments show the metric detects deviations from intended behavior.

03

The approach enhances understanding of model rationale stability.

Abstract

Reliable pattern recognition systems should exhibit consistent behavior across similar inputs, and their explanations should remain stable. However, most Explainable AI evaluations remain instance centric and do not explicitly quantify whether attribution patterns are consistent across samples that share the same class or represent small variations of the same input. In this work, we propose a novel metric aimed at assessing the consistency of model explanations, ensuring that models consistently reflect the intended objectives and consistency under label-preserving perturbations. We implement this metric using a pre-trained BERT model on the SST-2 sentiment analysis dataset, with additional robustness tests on RoBERTa, DistilBERT, and IMDB, applying SHAP to compute feature importance for various test samples. The proposed metric quantifies the cosine similarity of SHAP values for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anmspro/ESS-XAI-Stability
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.