Efficient Algorithms for Generating Provably Near-Optimal Cluster   Descriptors for Explainability

Prathyush Sambaturu; Aparna Gupta; Ian Davidson; S. S. Ravi; Anil; Vullikanti; Andrew Warren

arXiv:2002.02487·cs.DS·February 10, 2020·1 cites

Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability

Prathyush Sambaturu, Aparna Gupta, Ian Davidson, S. S. Ravi, Anil, Vullikanti, Andrew Warren

PDF

Open Access 1 Repo

TL;DR

This paper introduces efficient algorithms with provable guarantees for generating minimal, disjoint tag sets to explain clusters, enhancing interpretability in machine learning applications like genomic threat level classification.

Contribution

The paper extends previous work by developing approximation algorithms for a NP-hard cluster explanation problem with performance guarantees.

Findings

01

Algorithms achieve near-optimal tag set sizes.

02

Applications include genomic sequence cluster explanations.

03

Performance guarantees are proven for the algorithms.

Abstract

Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects $S$ , a partition $π$ of $S$ (into clusters), and a universe $T$ of tags such that each element in $S$ is associated with a subset of tags, the goal is to find a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prathyush6/ExplainabilityCodeAAAI20
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification