Intrinsic User-Centric Interpretability through Global Mixture of Experts
Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja K\"aser

TL;DR
This paper introduces InterpretCC, an intrinsically interpretable neural network model that uses a global mixture of experts to provide human-friendly explanations while maintaining high predictive performance.
Contribution
The paper proposes InterpretCC, a novel global mixture-of-experts model that enhances interpretability and actionability of explanations without sacrificing accuracy.
Findings
InterpretCC achieves comparable accuracy to state-of-the-art models.
It provides explanations that are more actionable and useful for users.
User study shows higher perceived usefulness of InterpretCC explanations.
Abstract
In human-centric settings like education or healthcare, model accuracy and model explainability are key factors for user adoption. Towards these two goals, intrinsically interpretable deep learning models have gained popularity, focusing on accurate predictions alongside faithful explanations. However, there exists a gap in the human-centeredness of these approaches, which often produce nuanced and complex explanations that are not easily actionable for downstream users. We present InterpretCC (interpretable conditional computation), a family of intrinsically interpretable neural networks at a unique point in the design space that optimizes for ease of human understanding and explanation faithfulness, while maintaining comparable performance to state-of-the-art models. InterpretCC achieves this through adaptive sparse activation of features before prediction, allowing the model to use a…
Peer Reviews
Decision·ICLR 2025 Poster
## Originality -- high - ICC group routing is original. - The paper showed that grouping could be done by LLM, and it is as good as the handcrafted ones, showing that this approach can be used on a large scale without requiring extra labor. ## Quality -- high - The paper showed ICC can be used on different data types as they tested ICC on 3 different data domains. - ICC was benchmarked against strong DNN baselines and good interpretable baselines across 8 datasets. - For interpretability, Ope
## Significance -- Medium - ICC feature gating is almost the same as SENN feature except with a sparse mask. - ICC group routing might be inefficient when the number of groups significantly increases, since the model complexity will increase as well. - ICC sparsity depends on the temperature of the Gumbel Softmax, but its effect was not investigated in the paper.
1. **Simple, intuitive, and novel idea for the design of intrinsically interpretable neural network architectures.** The authors present a novel idea for the design of neural network-based models that are intrinsically interpretable yet retain strong performance. 2. **Thorough experimental analysis.** The authors evaluate their method on five datasets, spanning several different domains and modalities. They analyze both the performance of the model and the quality of the explanations it produces
1. **Missing comparison/discussion of prior work on explain-then-predict / extractive rationale methods.** There is a substantial amount of existing work on intrinsically interpretable models that involve the same basic two steps proposed in this work: (1) select a subset of the input as the “explanation”/”rationale” and (2) use a model that sees only this explanation to make the final prediction. A lot of this has been done in the NLP space; see the discussion in Section 4.5.2 in [1], and the s
The unique strength of this paper is to put user-centric design into intrinsic explanation models, aiming to address the applicability of developed models in general domains. It is nice to see that efforts of pushing model design into user-centric design. It extends the intrinsic explanation models from visions to other modalities, especially time series, text and tabular datasets. It has conducted systematic evaluation of four domains with their baseline different explanation models and meas
- In real world scenarios, explanations are not just a set of features, rather than the interactions of a pair of features. Do you consider to identify the interactions of features in your interpretCC framemwork - In your user evaluation, as your method is providing local interpretation, as mentioned that interpretCC can recommend interpretation like "“This student was predicted to pass the course because and only because of the student’s regularity and video watching behavior". How can you prov
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
