Unimodal Strategies in Density-Based Clustering
Oron Nir, Jay Tenenbaum, Ariel Shamir

TL;DR
This paper introduces a novel property of density-based clustering that is nearly unimodal with respect to the neighborhood radius, enabling more efficient parameter tuning via Ternary Search in large-scale, high-dimensional data applications.
Contribution
It reveals a key unimodal property of density-based clustering parameters and leverages it to develop an efficient radius tuning strategy using Ternary Search.
Findings
The neighborhood radius relation is nearly unimodal, supported empirically and theoretically.
The proposed method improves parameter tuning efficiency for large-scale, high-dimensional data.
Validated across NLP, Audio, and Computer Vision tasks, demonstrating robustness.
Abstract
Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Stochastic Gradient Optimization Techniques
