TL;DR
Clust-Splitter is a novel nonsmooth optimization-based algorithm that efficiently clusters large-scale datasets, achieving high-quality results comparable to state-of-the-art methods.
Contribution
The paper introduces Clust-Splitter, a new algorithm combining nonsmooth optimization and the limited memory bundle method for large-scale clustering.
Findings
Efficiently clusters very large datasets with high accuracy.
Outperforms or matches existing state-of-the-art clustering algorithms.
Demonstrates scalability and solution quality on real-world data.
Abstract
Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the minimum sum-of-squares clustering problem in very large datasets. The clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively, the limited memory bundle method is combined with an incremental approach to develop the Clust-Splitter algorithm. We evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
