Explainable Clustering Beyond Worst-Case Guarantees

Maximilian Fleissner; Maedeh Zarvandi; Debarghya Ghoshdastidar

arXiv:2411.01576·cs.LG·August 8, 2025

Explainable Clustering Beyond Worst-Case Guarantees

Maximilian Fleissner, Maedeh Zarvandi, Debarghya Ghoshdastidar

PDF

Open Access

TL;DR

This paper investigates explainable clustering using decision trees, demonstrating improved guarantees for well-clustered data and kernel clustering within a statistical mixture model framework, surpassing worst-case bounds.

Contribution

It introduces a data-dependent algorithm for explainable clustering that achieves tighter guarantees on well-clustered data and extends analysis to kernel clustering.

Findings

01

Better guarantees for explainable clustering on well-clustered data.

02

Algorithm constructs trees in data-independent time.

03

Improved bounds for kernel clustering.

Abstract

We study the explainable clustering problem first posed by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020). The goal of explainable clustering is to fit an axis-aligned decision tree with $K$ leaves and minimal clustering cost (where every leaf is a cluster). The fundamental theoretical question in this line of work is the \textit{price of explainability}, defined as the ratio between the clustering cost of the tree and the optimal cost. Numerous papers have provided worst-case guarantees on this quantity. For $K$ -medians, it has recently been shown that the worst-case price of explainability is $Θ (lo g K)$ . While this settles the matter from a data-agnostic point of view, two important questions remain unanswered: Are tighter guarantees possible for well-clustered data? And can we trust decision trees to recover underlying cluster structures? In this paper, we place…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models