Clust-Splitter - an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

Jenni Lampainen; Kaisa Joki; Napsu Karmitsa; Marko M. M\"akel\"a

arXiv:2505.04389·cs.LG·March 19, 2026

Clust-Splitter - an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

Jenni Lampainen, Kaisa Joki, Napsu Karmitsa, Marko M. M\"akel\"a

PDF

1 Repo

TL;DR

Clust-Splitter is a novel nonsmooth optimization-based algorithm that efficiently clusters large-scale datasets, achieving high-quality results comparable to state-of-the-art methods.

Contribution

The paper introduces Clust-Splitter, a new algorithm combining nonsmooth optimization and the limited memory bundle method for large-scale clustering.

Findings

01

Efficiently clusters very large datasets with high accuracy.

02

Outperforms or matches existing state-of-the-art clustering algorithms.

03

Demonstrates scalability and solution quality on real-world data.

Abstract

Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the minimum sum-of-squares clustering problem in very large datasets. The clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively, the limited memory bundle method is combined with an incremental approach to develop the Clust-Splitter algorithm. We evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jmlamp/Clust-Splitter
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.