Conditional Autoregressors are Interpretable Classifiers

Nathan Elazar

arXiv:2203.17002·cs.LG·April 1, 2022

Conditional Autoregressors are Interpretable Classifiers

Nathan Elazar

PDF

Open Access

TL;DR

This paper demonstrates that class-conditional autoregressive models can serve as inherently interpretable classifiers for image data, and with knowledge distillation, they can achieve competitive accuracy.

Contribution

It introduces the use of CA models for classification and shows how to train them effectively for interpretability without sacrificing performance.

Findings

01

CA models are inherently locally interpretable.

02

Naive training of CA models results in poor accuracy due to overfitting.

03

Knowledge distillation enables CA models to match standard classifiers' performance.

Abstract

We explore the use of class-conditional autoregressive (CA) models to perform image classification on MNIST-10. Autoregressive models assign probability to an entire input by combining probabilities from each individual feature; hence classification decisions made by a CA can be readily decomposed into contributions from each each input feature. That is to say, CA are inherently locally interpretable. Our experiments show that naively training a CA achieves much worse accuracy compared to a standard classifier, however this is due to over-fitting and not a lack of expressive power. Using knowledge distillation from a standard classifier, a student CA can be trained to match the performance of the teacher while still being interpretable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks

MethodsKnowledge Distillation