Understanding Interpretability by generalized distillation in Supervised   Classification

Adit Agarwal; K.K. Shukla; Arjan Kuijper; Anirban; Mukhopadhyay

arXiv:2012.03089·cs.LG·September 5, 2024

Understanding Interpretability by generalized distillation in Supervised Classification

Adit Agarwal, K.K. Shukla, Arjan Kuijper, Anirban, Mukhopadhyay

PDF

Open Access

TL;DR

This paper introduces a novel, information-theoretic approach to interpretability in supervised classification, using generalized distillation to quantify model interpretability without human bias, supported by theoretical bounds and empirical validation.

Contribution

It generalizes the distillation technique for interpretability, providing the first theoretical bounds on the interpretability of Piece-Wise Linear Neural Networks (PWLNs).

Findings

01

The framework quantifies interpretability via entropy bounds.

02

Theoretical bounds on PWLN interpretability are established.

03

Empirical validation on MNIST, Fashion-MNIST, and Stanford40 datasets.

Abstract

The ability to interpret decisions taken by Machine Learning (ML) models is fundamental to encourage trust and reliability in different practical applications. Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex ML models. However, these strategies are restricted by the subjective biases of humans. To dissociate from such human biases, we propose an interpretation-by-distillation formulation that is defined relative to other ML models. We generalize the distillation technique for quantifying interpretability, using an information-theoretic perspective, removing the role of ground-truth from the definition of interpretability. Our work defines the entropy of supervised classification models, providing bounds on the entropy of Piece-Wise Linear Neural Networks (PWLNs), along with the first theoretical bounds on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsInterpretability