Causality for Inherently Explainable Transformers: CAT-XPLAIN

Subash Khanal; Benjamin Brodie; Xin Xing; Ai-Ling Lin; Nathan Jacobs

arXiv:2206.14841·cs.CV·July 1, 2022·1 cites

Causality for Inherently Explainable Transformers: CAT-XPLAIN

Subash Khanal, Benjamin Brodie, Xin Xing, Ai-Ling Lin, Nathan Jacobs

PDF

Open Access 1 Repo

TL;DR

This paper introduces an inherently explainable transformer model using causal explanations, providing top-k input regions contributing to decisions, and demonstrates improved explainability over post-hoc methods on image classification tasks.

Contribution

The paper presents a novel transformer architecture that is inherently explainable through causal reasoning, eliminating the need for separate post-hoc explanation models.

Findings

01

Better explainability results compared to post-hoc causal explainers

02

Achieves inherent interpretability without additional explanation training

03

Effective on binary image classification datasets like MNIST, FMNIST, CIFAR

Abstract

There have been several post-hoc explanation approaches developed to explain pre-trained black-box neural networks. However, there is still a gap in research efforts toward designing neural networks that are inherently explainable. In this paper, we utilize a recently proposed instance-wise post-hoc causal explanation method to make an existing transformer architecture inherently explainable. Once trained, our model provides an explanation in the form of top- $k$ regions in the input space of the given instance contributing to its decision. We evaluate our method on binary classification tasks using three image datasets: MNIST, FMNIST, and CIFAR. Our results demonstrate that compared to the causality-based post-hoc explainer model, our inherently explainable model achieves better explainability results while eliminating the need of training a separate explainer model. Our code is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mvrl/cat-xplain
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications