TL;DR
This paper introduces Fair-Lloyd, a modified k-means algorithm that ensures equitable clustering costs across different groups, addressing bias issues in standard k-means with minimal computational overhead.
Contribution
The paper proposes a fair k-means objective and an efficient algorithm, Fair-Lloyd, that produces unbiased clusterings across groups, improving fairness in data analysis.
Findings
Fair-Lloyd achieves unbiased group costs in benchmark datasets.
The algorithm maintains efficiency similar to standard Lloyd's k-means.
Fair-Lloyd incurs negligible additional runtime compared to standard k-means.
Abstract
We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
