Interpretability with full complexity by constraining feature information
Kieran A. Murphy, Dani S. Bassett

TL;DR
This paper introduces a novel interpretability method for machine learning models that constrains feature information using the Distributed Information Bottleneck, enabling detailed analysis of feature importance and interactions without limiting model complexity.
Contribution
It proposes a new information-theoretic approach to interpretability that preserves model complexity while providing rich insights into feature relevance and interactions.
Findings
Effective feature importance analysis across complex models
Enhanced interpretability without restricting model complexity
Demonstrated utility on various tabular datasets
Abstract
Interpretability is a pressing issue for machine learning. Common approaches to interpretable machine learning constrain interactions between features of the input, rendering the effects of those features on a model's output comprehensible but at the expense of model complexity. We approach interpretability from a new angle: constrain the information about the features without restricting the complexity of the model. Borrowing from information theory, we use the Distributed Information Bottleneck to find optimal compressions of each feature that maximally preserve information about the output. The learned information allocation, by feature and by feature value, provides rich opportunities for interpretation, particularly in problems with many features and complex feature interactions. The central object of analysis is not a single trained model, but rather a spectrum of models serving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsFeature Selection
