Concept Bottleneck Models
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma, Pierson, Been Kim, Percy Liang

TL;DR
This paper explores concept bottleneck models that predict high-level concepts before the final output, enabling interpretability and human intervention, and demonstrates their effectiveness on medical and bird classification tasks.
Contribution
It revisits concept bottleneck models, showing they can match standard models in accuracy while providing interpretability and test-time human correction capabilities.
Findings
Competitive accuracy with end-to-end models
Enhanced interpretability through high-level concepts
Improved accuracy with human corrections at test time
Abstract
We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., arthritis severity). We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label. By construction, we can intervene on these concept bottleneck models by editing their predicted concept values and propagating these changes to the final prediction. On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Bioinformatics · Metabolomics and Mass Spectrometry Studies · Machine Learning in Healthcare
