Fast and effective algorithms for fair clustering at scale

Claudio Mantuano; Manuel Kammermann; Philipp Baumann

arXiv:2605.13759·cs.LG·May 14, 2026

Fast and effective algorithms for fair clustering at scale

Claudio Mantuano, Manuel Kammermann, Philipp Baumann

PDF

1 Repo

TL;DR

This paper introduces scalable algorithms for fair clustering that balance cost and fairness, outperforming existing methods and enabling practical application to large datasets.

Contribution

It presents a general framework with three heuristics for fair clustering, offering improved control, scalability, and solution quality compared to prior approaches.

Findings

01

Heuristics outperform existing methods on benchmark datasets.

02

Maximum scalability heuristic handles millions of objects in seconds.

03

Framework provides precise control over cost-fairness trade-off.

Abstract

Clustering is an unsupervised machine learning task that consists of identifying groups of similar objects. It has numerous applications and is increasingly used in fairness-sensitive domains where objects represent individuals, such as customers, employees, or students. We address a fair clustering problem in which objects belong to protected groups. The problem consists of partitioning the objects into a predefined number of clusters while attaining a user-defined target level of fairness, meaning that each protected group is sufficiently represented in each cluster. The objective is to minimize the clustering cost, defined as the sum of squared Euclidean distances between the objects and the centers of their clusters. Since clustering cost and fairness are generally in conflict, managing the trade-off between them is essential in practical applications. Existing methods provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.