TL;DR
This paper introduces scalable algorithms for fair clustering that balance cost and fairness, outperforming existing methods and enabling practical application to large datasets.
Contribution
It presents a general framework with three heuristics for fair clustering, offering improved control, scalability, and solution quality compared to prior approaches.
Findings
Heuristics outperform existing methods on benchmark datasets.
Maximum scalability heuristic handles millions of objects in seconds.
Framework provides precise control over cost-fairness trade-off.
Abstract
Clustering is an unsupervised machine learning task that consists of identifying groups of similar objects. It has numerous applications and is increasingly used in fairness-sensitive domains where objects represent individuals, such as customers, employees, or students. We address a fair clustering problem in which objects belong to protected groups. The problem consists of partitioning the objects into a predefined number of clusters while attaining a user-defined target level of fairness, meaning that each protected group is sufficiently represented in each cluster. The objective is to minimize the clustering cost, defined as the sum of squared Euclidean distances between the objects and the centers of their clusters. Since clustering cost and fairness are generally in conflict, managing the trade-off between them is essential in practical applications. Existing methods provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
