A Fast and Effective Method for Euclidean Anticlustering: The Assignment-Based-Anticlustering Algorithm
Philipp Baumann, Olivier Goldschmidt, Dorit S. Hochbaum, Jason Yang

TL;DR
This paper introduces the Assignment-Based Anticlustering (ABA) algorithm, a scalable and efficient method for Euclidean anticlustering that outperforms existing heuristics in large-scale datasets and applications.
Contribution
The paper presents a novel ABA algorithm that significantly improves scalability and solution quality for Euclidean anticlustering tasks, especially on large datasets.
Findings
ABA outperforms fast_anticlustering in solution quality and speed.
ABA scales to datasets with millions of objects and hundreds of thousands of anticlusters.
ABA surpasses METIS in both solution quality and runtime for K-cut partitioning.
Abstract
The anticlustering problem is to partition a set of objects into K equal-sized anticlusters such that the sum of distances within anticlusters is maximized. The anticlustering problem is NP-hard. We focus on anticlustering in Euclidean spaces, where the input data is tabular and each object is represented as a D-dimensional feature vector. Distances are measured as squared Euclidean distances between the respective vectors. Applications of Euclidean anticlustering include social studies, particularly in psychology, K-fold cross-validation in which each fold should be a good representative of the entire dataset, the creation of mini-batches for gradient descent in neural network training, and balanced K-cut partitioning. In particular, machine-learning applications involve million-scale datasets and very large values of K, making scalable anticlustering algorithms essential. Existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Advanced Clustering Algorithms Research
