Post-hoc Concept Bottleneck Models

Mert Yuksekgonul; Maggie Wang; James Zou

arXiv:2205.15480·cs.LG·February 3, 2023·37 cites

Post-hoc Concept Bottleneck Models

Mert Yuksekgonul, Maggie Wang, James Zou

PDF

Open Access 1 Datasets 1 Video

TL;DR

Post-hoc Concept Bottleneck Models (PCBMs) transform any neural network into an interpretable model without sacrificing accuracy, enabling concept transfer, debugging, and global model edits for improved generalization.

Contribution

We introduce PCBMs, a method to convert existing neural networks into interpretable models without performance loss, and demonstrate concept transfer and efficient model editing.

Findings

01

PCBMs match neural network accuracy while providing interpretability.

02

Concept transfer from other datasets or language descriptions is effective.

03

Model editing via concept feedback improves performance without retraining.

Abstract

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

anonymous347928/pcbm_metashift
dataset· 38 dl
38 dl

Videos

Post-hoc Concept Bottleneck Models· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare