Communication-Avoiding Linear Algebraic Kernel K-Means on GPUs

Julian Bellavita; Matthew Rubino; Nakul Iyer; Andrew Chang; Aditya Devarakonda; Flavio Vella; Giulia Guidi

arXiv:2601.17136·cs.DC·January 29, 2026

Communication-Avoiding Linear Algebraic Kernel K-Means on GPUs

Julian Bellavita, Matthew Rubino, Nakul Iyer, Andrew Chang, Aditya Devarakonda, Flavio Vella, Giulia Guidi

PDF

Open Access

TL;DR

This paper introduces distributed-memory parallel algorithms for large-scale Kernel K-means clustering on multi-GPU systems, enabling efficient clustering of datasets with millions of samples.

Contribution

It presents communication-efficient distributed algorithms and partitioning schemes tailored for Kernel K-means, significantly improving scalability and performance on multi-GPU systems.

Findings

01

Achieves up to 79.7% weak scaling efficiency on 256 GPUs.

02

Reduces clustering time from over an hour to under two seconds.

03

Provides up to 3.6x speedup over previous algorithms.

Abstract

Clustering is an important tool in data analysis, with K-means being popular for its simplicity and versatility. However, it cannot handle non-linearly separable clusters. Kernel K-means addresses this limitation but requires a large kernel matrix, making it computationally and memory intensive. Prior work has accelerated Kernel K-means by formulating it using sparse linear algebra primitives and implementing it on a single GPU. However, that approach cannot run on datasets with more than approximately 80,000 samples due to limited GPU memory. In this work, we address this issue by presenting a suite of distributed-memory parallel algorithms for large-scale Kernel K-means clustering on multi-GPU systems. Our approach maps the most computationally expensive components of Kernel K-means onto communication-efficient distributed linear algebra primitives uniquely tailored for Kernel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Stochastic Gradient Optimization Techniques · Cloud Computing and Resource Management