Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich

TL;DR
This paper introduces a method to decompose a Blackbox model into a mixture of interpretable models and residuals, enabling better explanations, concept discovery, and improved test-time interventions without sacrificing performance.
Contribution
It proposes an iterative approach to carve out interpretable experts from a Blackbox, enhancing interpretability and explanation quality while maintaining predictive accuracy.
Findings
Identifies diverse, instance-specific concepts with high completeness.
Effectively isolates harder-to-explain samples via residuals.
Outperforms interpretable-by-design models in test-time interventions.
Abstract
ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Software Engineering Research
MethodsHigh-Order Consensuses
