Understanding Interpretability by generalized distillation in Supervised Classification
Adit Agarwal, K.K. Shukla, Arjan Kuijper, Anirban, Mukhopadhyay

TL;DR
This paper introduces a novel, information-theoretic approach to interpretability in supervised classification, using generalized distillation to quantify model interpretability without human bias, supported by theoretical bounds and empirical validation.
Contribution
It generalizes the distillation technique for interpretability, providing the first theoretical bounds on the interpretability of Piece-Wise Linear Neural Networks (PWLNs).
Findings
The framework quantifies interpretability via entropy bounds.
Theoretical bounds on PWLN interpretability are established.
Empirical validation on MNIST, Fashion-MNIST, and Stanford40 datasets.
Abstract
The ability to interpret decisions taken by Machine Learning (ML) models is fundamental to encourage trust and reliability in different practical applications. Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex ML models. However, these strategies are restricted by the subjective biases of humans. To dissociate from such human biases, we propose an interpretation-by-distillation formulation that is defined relative to other ML models. We generalize the distillation technique for quantifying interpretability, using an information-theoretic perspective, removing the role of ground-truth from the definition of interpretability. Our work defines the entropy of supervised classification models, providing bounds on the entropy of Piece-Wise Linear Neural Networks (PWLNs), along with the first theoretical bounds on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsInterpretability
