Interpretability is in the Mind of the Beholder: A Causal Framework for   Human-interpretable Representation Learning

Emanuele Marconato; Andrea Passerini; Stefano Teso

arXiv:2309.07742·cs.LG·September 15, 2023

Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

Emanuele Marconato, Andrea Passerini, Stefano Teso

PDF

Open Access

TL;DR

This paper introduces a causal framework for human-interpretable representation learning that models the human as an external observer, linking interpretability, alignment, and disentanglement through a formal information-theoretic approach.

Contribution

It proposes a mathematical framework for acquiring interpretable representations that explicitly incorporates the human perspective, unifying concepts like alignment, disentanglement, and concept leakage.

Findings

01

Defines a formal notion of alignment between machine and human concepts.

02

Links alignment to disentanglement and concept leakage.

03

Provides an information-theoretic reformulation of interpretability properties.

Abstract

Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare