Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability
Prathyush Sambaturu, Aparna Gupta, Ian Davidson, S. S. Ravi, Anil, Vullikanti, Andrew Warren

TL;DR
This paper introduces efficient algorithms with provable guarantees for generating minimal, disjoint tag sets to explain clusters, enhancing interpretability in machine learning applications like genomic threat level classification.
Contribution
The paper extends previous work by developing approximation algorithms for a NP-hard cluster explanation problem with performance guarantees.
Findings
Algorithms achieve near-optimal tag set sizes.
Applications include genomic sequence cluster explanations.
Performance guarantees are proven for the algorithms.
Abstract
Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects , a partition of (into clusters), and a universe of tags such that each element in is associated with a subset of tags, the goal is to find a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
