Causality for Inherently Explainable Transformers: CAT-XPLAIN
Subash Khanal, Benjamin Brodie, Xin Xing, Ai-Ling Lin, Nathan Jacobs

TL;DR
This paper introduces an inherently explainable transformer model using causal explanations, providing top-k input regions contributing to decisions, and demonstrates improved explainability over post-hoc methods on image classification tasks.
Contribution
The paper presents a novel transformer architecture that is inherently explainable through causal reasoning, eliminating the need for separate post-hoc explanation models.
Findings
Better explainability results compared to post-hoc causal explainers
Achieves inherent interpretability without additional explanation training
Effective on binary image classification datasets like MNIST, FMNIST, CIFAR
Abstract
There have been several post-hoc explanation approaches developed to explain pre-trained black-box neural networks. However, there is still a gap in research efforts toward designing neural networks that are inherently explainable. In this paper, we utilize a recently proposed instance-wise post-hoc causal explanation method to make an existing transformer architecture inherently explainable. Once trained, our model provides an explanation in the form of top- regions in the input space of the given instance contributing to its decision. We evaluate our method on binary classification tasks using three image datasets: MNIST, FMNIST, and CIFAR. Our results demonstrate that compared to the causality-based post-hoc explainer model, our inherently explainable model achieves better explainability results while eliminating the need of training a separate explainer model. Our code is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
