Fair Clustering Through Fairlets
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii

TL;DR
This paper introduces a novel approach called fairlets for fair clustering, ensuring equal representation of protected classes within clusters, and provides approximation algorithms with empirical validation on real datasets.
Contribution
It formulates fair clustering under the $k$-center and $k$-median objectives, introduces fairlets for decomposition, and develops efficient approximation algorithms.
Findings
Fair clustering can be achieved with fairlets and flow-based algorithms.
Approximation algorithms provide near-optimal fair clustering solutions.
Empirical results demonstrate the effectiveness of fair clustering on real datasets.
Abstract
We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the -center and the -median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Statistical Methods and Inference · Imbalanced Data Classification Techniques
