The Condition-Number Principle for Prototype Clustering
Romano Li, Jianfei Cao

TL;DR
This paper introduces a geometric framework for prototype clustering that links objective accuracy to structural recovery, providing deterministic guarantees and insights into robustness and cluster imbalance trade-offs.
Contribution
It defines a clustering condition number applicable to various loss functions, establishing a geometric principle for interpreting clustering quality and recovery guarantees.
Findings
Small condition number implies low misclassification error.
Errors concentrate near cluster boundaries.
Deep cluster cores are recovered exactly under local margins.
Abstract
We develop a geometric framework that links objective accuracy to structural recovery in prototype-based clustering. The analysis is algorithm-agnostic and applies to a broad class of admissible loss functions. We define a clustering condition number that compares within-cluster scale to the minimum loss increase required to move a point across a cluster boundary. When this quantity is small, any solution with a small suboptimality gap must also have a small misclassification error relative to a benchmark partition. The framework also clarifies a fundamental trade-off between robustness and sensitivity to cluster imbalance, leading to sharp phase transitions for exact recovery under different objectives. The guarantees are deterministic and non-asymptotic, and they separate the role of algorithmic accuracy from the intrinsic geometric difficulty of the instance. We further show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
