Efficient Data Representation by Selecting Prototypes with Importance   Weights

Karthik S. Gurumoorthy; Amit Dhurandhar; Guillermo Cecchi; and Charu; Aggarwal

arXiv:1707.01212·stat.ML·August 13, 2019

Efficient Data Representation by Selecting Prototypes with Importance Weights

Karthik S. Gurumoorthy, Amit Dhurandhar, Guillermo Cecchi, and Charu, Aggarwal

PDF

1 Repo

TL;DR

This paper introduces a theoretically grounded, efficient algorithm for selecting prototypes and outliers with importance weights across diverse domains, enhancing data interpretability and insight extraction.

Contribution

It generalizes existing prototype selection methods to include importance weights, applicable to any symmetric positive definite kernel, with proven approximation guarantees.

Findings

01

Effective prototype and outlier selection demonstrated on retail, MNIST, and CDC health data.

02

Quantitative and qualitative validation confirms improved interpretability and insight.

03

Algorithm offers fast, theoretically supported solutions with broad applicability.

Abstract

Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dwiggles/AIX360-withdata
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.