A Computational Approach to Improving Fairness in K-means Clustering
Guancheng Zhou, Haiping Xu, Hongkang Xu, Chenyu Li, Donghui Yan

TL;DR
This paper introduces a two-stage optimization method to enhance fairness in K-means clustering by adjusting cluster memberships, addressing bias related to sensitive attributes with minimal impact on clustering quality.
Contribution
It proposes two efficient algorithms for identifying and adjusting unfairly biased data points, improving fairness in K-means clustering.
Findings
Significant fairness improvements on benchmark datasets
Minimal impact on clustering quality
Algorithms extendable to other clustering methods
Abstract
The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some sensitive variable, e.g., gender or race. Such a fairness issue may cause bias and unexpected social consequences. This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation--clustering first and then adjust cluster membership of a small subset of selected data points. Two computationally efficient algorithms are proposed in identifying those data points that are expensive for fairness, with one focusing on nearest data points outside of a cluster and the other on highly 'mixed' data points. Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Customer churn and segmentation
Methodsk-Means Clustering
