Implicit Mixture of Interpretable Experts for Global and Local Interpretability
Nathan Elazar, Kerry Taylor

TL;DR
This paper introduces IMoIE, a model that combines interpretable experts with a black-box router to achieve accurate and interpretable image classification, addressing issues of cheating and enabling analysis of interpretability at different scales.
Contribution
The paper proposes a novel implicit parameterization scheme for mixtures of interpretable experts, improving interpretability and performance in image classification tasks.
Findings
IMoIE matches state-of-the-art accuracy on MNIST10.
IMoIE provides local interpretability for individual decisions.
Global interpretability is achievable with some accuracy trade-offs.
Abstract
We investigate the feasibility of using mixtures of interpretable experts (MoIE) to build interpretable image classifiers on MNIST10. MoIE uses a black-box router to assign each input to one of many inherently interpretable experts, thereby providing insight into why a particular classification decision was made. We find that a naively trained MoIE will learn to 'cheat', whereby the black-box router will solve the classification problem by itself, with each expert simply learning a constant function for one particular class. We propose to solve this problem by introducing interpretable routers and training the black-box router's decisions to match the interpretable router. In addition, we propose a novel implicit parameterization scheme that allows us to build mixtures of arbitrary numbers of experts, allowing us to study how classification performance, local and global interpretability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
