Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding
Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Lin Mei, Peiyi Shen, Liang Zhang

TL;DR
This paper introduces CBM-HNMU, a method that improves the interpretability and accuracy of deep neural networks by intervening at the concept level, refining detrimental concepts, and distilling corrections back into the model.
Contribution
It proposes a novel concept bottleneck framework that automatically identifies and refines harmful concepts to enhance human understanding and model performance.
Findings
Achieved up to 2.64% accuracy improvement on various datasets.
Enhanced interpretability by approximating black-box reasoning with concepts.
Demonstrated effectiveness across CNN and transformer models.
Abstract
Recent advances in deep learning have led to increasingly complex models with deeper layers and more parameters, reducing interpretability and making their decisions harder to understand. While many methods explain black-box reasoning, most lack effective interventions or only operate at sample-level without modifying the model itself. To address this, we propose the Concept Bottleneck Model for Enhancing Human-Neural Network Mutual Understanding (CBM-HNMU). CBM-HNMU leverages the Concept Bottleneck Model (CBM) as an interpretable framework to approximate black-box reasoning and communicate conceptual understanding. Detrimental concepts are automatically identified and refined (removed/replaced) based on global gradient contributions. The modified CBM then distills corrected knowledge back into the black-box model, enhancing both interpretability and accuracy. We evaluate CBM-HNMU on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Text Analysis Techniques · Big Data Technologies and Applications
