Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box   Transformers

Shaobo Wang; Hongxuan Tang; Mingyang Wang; Hongrui Zhang; Xuyang Liu,; Weiya Li; Xuming Hu; Linfeng Zhang

arXiv:2410.21815·cs.LG·February 26, 2025

Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers

Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu,, Weiya Li, Xuming Hu, Linfeng Zhang

PDF

Open Access

TL;DR

This paper introduces AutoGnothi, a novel, efficient method that enhances black-box transformers with self-interpretability and accurate explanations, bridging the gap between interpretability and performance in AI models.

Contribution

AutoGnothi is a parameter-efficient pipeline that enables black-box models to generate Shapley value explanations without altering original parameters, improving interpretability and efficiency.

Findings

01

AutoGnothi provides accurate explanations for vision and language tasks.

02

It significantly reduces memory, training, and inference costs.

03

AutoGnothi outperforms traditional parameter-efficient methods.

Abstract

The debate between self-interpretable models and post-hoc explanations for black-box models is central to Explainable AI (XAI). Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to human-understandable concepts but often struggle with performance and scalability. Conversely, post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive. To bridge the gap between these two lines of research, we propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models without compromising prediction accuracy. Specifically, we introduce a parameter-efficient pipeline, AutoGnothi, which integrates a small side network into the black-box model, allowing it to generate Shapley value explanations without changing the original network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterpreting and Communication in Healthcare