Prototype selection for interpretable classification
Jacob Bien, Robert Tibshirani

TL;DR
This paper introduces a prototype selection method for classification that identifies a minimal, representative subset of samples to enhance interpretability and efficiency in large data sets.
Contribution
It proposes a set cover-based approach for selecting prototypes, emphasizing interpretability and sparsity in the sample space rather than just variables.
Findings
The method effectively reduces data complexity while maintaining classification accuracy.
Prototypes provide meaningful interpretative summaries of data.
The approach can be used both for interpretability and as a classifier.
Abstract
Prototype methods seek a minimal subset of samples that can serve as a distillation or condensed view of a data set. As the size of modern data sets grows, being able to present a domain specialist with a short list of "representative" samples chosen from the data set is of increasing interpretative value. While much recent statistical research has been focused on producing sparse-in-the-variables methods, this paper aims at achieving sparsity in the samples. We discuss a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories). Our method of focus is derived from three basic properties that we believe a good prototype set should satisfy. This intuition is translated into a set cover optimization problem, which we solve approximately using standard approaches. While prototype selection is usually viewed as purely a means…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
