Socially Fair k-Means Clustering

Mehrdad Ghadiri; Samira Samadi; Santosh Vempala

arXiv:2006.10085·cs.LG·October 30, 2020

Socially Fair k-Means Clustering

Mehrdad Ghadiri, Samira Samadi, Santosh Vempala

PDF

2 Repos

TL;DR

This paper introduces Fair-Lloyd, a modified k-means algorithm that ensures equitable clustering costs across different groups, addressing bias issues in standard k-means with minimal computational overhead.

Contribution

The paper proposes a fair k-means objective and an efficient algorithm, Fair-Lloyd, that produces unbiased clusterings across groups, improving fairness in data analysis.

Findings

01

Fair-Lloyd achieves unbiased group costs in benchmark datasets.

02

The algorithm maintains efficiency similar to standard Lloyd's k-means.

03

Fair-Lloyd incurs negligible additional runtime compared to standard k-means.

Abstract

We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering