Interpretability Needs a New Paradigm

Andreas Madsen; Himabindu Lakkaraju; Siva Reddy; Sarath Chandar

arXiv:2405.05386·cs.LG·November 14, 2024·2 cites

Interpretability Needs a New Paradigm

Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar

PDF

Open Access

TL;DR

This paper argues that interpretability in AI needs new paradigms beyond intrinsic and post-hoc, emphasizing faithfulness and proposing three emerging paradigms to improve explanation reliability.

Contribution

The paper introduces three novel paradigms for interpretability, focusing on designing models that enhance faithfulness and explanation quality.

Findings

01

Current paradigms have limitations in ensuring faithfulness.

02

Three emerging paradigms are proposed for better interpretability.

03

Evolving paradigms can improve trustworthiness of AI explanations.

Abstract

Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained. At the core of this debate is how each paradigm ensures its explanations are faithful, i.e., true to the model's behavior. This is important, as false but convincing explanations lead to unsupported confidence in artificial intelligence (AI), which can be dangerous. This paper's position is that we should think about new paradigms while staying vigilant regarding faithfulness. First, by examining the history of paradigms in science, we see that paradigms are constantly evolving. Then, by examining the current paradigms, we can understand their underlying beliefs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterpreting and Communication in Healthcare