Fixed Point Explainability
Emanuele La Malfa, Jon Vadillo, Marco Molinari, Michael Wooldridge

TL;DR
This paper proposes a formal framework for fixed point explanations that assess model explainability through recursive stability, revealing hidden behaviors and weaknesses in models including large language models.
Contribution
It introduces a novel formal notion of fixed point explanations, defining convergence conditions for various explainers and demonstrating their effectiveness on multiple datasets and models.
Findings
Fixed point explanations satisfy minimality, stability, and faithfulness.
Convergence conditions are established for feature-based and mechanistic explainers.
Quantitative and qualitative results demonstrate the approach's effectiveness.
Abstract
This paper introduces a formal notion of fixed point explanations, inspired by the "why regress" principle, to assess, through recursive applications, the stability of the interplay between a model and its explainer. Fixed point explanations satisfy properties like minimality, stability, and faithfulness, revealing hidden model behaviours and explanatory weaknesses. We define convergence conditions for several classes of explainers, from feature-based to mechanistic tools like Sparse AutoEncoders, and we report quantitative and qualitative results for several datasets and models, including LLMs such as Llama-3.3-70B.
Peer Reviews
Decision·Submitted to ICLR 2026
1. The concept of a "fixed point explanation," based on recursively applying an explainer to its own output (the "why regress" principle), is an interesting new way to think about and evaluate the stability of XAI methods. 2. The paper's ambition in applying this single framework across three very different and timely classes of explainers (feature-based, prototype-based, and mechanistic) demonstrates the potential generality of the fixed-point concept.
1. **Unclear Core Methodology and Role of the Support Function**: I found the paper's central definition difficult to follow. The main methodology in Section 2 defines the recursive step as $x_k = \epsilon(x_{k-1}; f)$, which implies the explainer $\epsilon$ outputs a new object in the same format as the original input $x$. However, for feature-based explainers like LIME or SHAP (Section 3.1), the explainer's output isn't a new image, but rather a set of feature importance or a heatmap (which t
The authors present solid arguments for analyzing explanation methods through the “why regress” principle, focusing on consistency and stability as key properties. The proposed framework is explored across both feature-based explainers such as VGG16 and transformer-based language models, showing some versatility in application.
The work is limited to saliency or feature-attribution-based explanation methods, most of which are relatively dated (5–7 years old). While understanding their properties is intellectually useful, the motivation feels weak in today’s context. Modern vision-language models no longer rely on such feature-based explanations, raising questions about the current relevance of the work. The description of the SAE’s application on transformer models lacks clarity. In particular, it is not well explaine
The idea to apply fixed-point analysis to explanation algorithms is, to the best of my knowledge, novel, and I find this idea interesting. The idea is also somewhat innovative and could be of potential interest to the community. The mathematical exposition in the paper is sound. While most of the analysis appears to be relatively straightforward, given that the concept of fixed points is well-studied in mathematics, the overall mathematical exposition is coherent. The concept of fixed-point a
The main weakness of the paper is that it does not contain a single empirical example where the benefits of the proposed fixed-point approach for explainability are apparent. In Figure 2, the fixed-point explanation looks just like the original explanation. In Figure 3, we can see that dynamics of iterating, but what is the point of obtaining the final image and heatmap, especially since the final image is different from the original image, and also the class label is now different? What is the
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference
