Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Sonia Laguna, Ri\v{c}ards Marcinkevi\v{c}s, Moritz Vandenhirtz, Julia, E. Vogt

TL;DR
This paper presents a method to enhance the intervenability of pretrained black-box neural networks through fine-tuning, enabling effective concept-based interventions even with limited concept labels, and demonstrates improved calibration and utility in medical imaging.
Contribution
It introduces a novel approach to make black-box models more intervenable by fine-tuning with limited concept labels, extending interpretability techniques beyond concept bottleneck models.
Findings
Fine-tuning improves intervention effectiveness across various architectures.
Black-box models can be made more intervenable than traditional CBMs.
Methods remain effective with vision-language concept annotations.
Abstract
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM). An advantage of this model class is the user's ability to intervene on predicted concept values, affecting the downstream output. In this work, we introduce a method to perform such concept-based interventions on pretrained neural networks, which are not interpretable by design, only given a small validation set with concept labels. Furthermore, we formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We focus on backbone architectures of varying complexity, from simple, fully connected neural nets to Stable Diffusion. We demonstrate that the proposed fine-tuning improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Data Stream Mining Techniques · Machine Learning in Healthcare
MethodsSparse Evolutionary Training · Focus · Diffusion
