Post-hoc Concept Bottleneck Models
Mert Yuksekgonul, Maggie Wang, James Zou

TL;DR
Post-hoc Concept Bottleneck Models (PCBMs) transform any neural network into an interpretable model without sacrificing accuracy, enabling concept transfer, debugging, and global model edits for improved generalization.
Contribution
We introduce PCBMs, a method to convert existing neural networks into interpretable models without performance loss, and demonstrate concept transfer and efficient model editing.
Findings
PCBMs match neural network accuracy while providing interpretability.
Concept transfer from other datasets or language descriptions is effective.
Model editing via concept feedback improves performance without retraining.
Abstract
Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Stream Mining Techniques · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
