Towards Fair Representation: Clustering and Consensus

Diptarka Chakraborty; Kushagra Chatterjee; Debarati Das; Tien Long Nguyen; Romina Nobahari

arXiv:2506.08673·cs.LG·June 18, 2025

Towards Fair Representation: Clustering and Consensus

Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das, Tien Long Nguyen, Romina Nobahari

PDF

Open Access

TL;DR

This paper introduces the first constant-factor approximation algorithms for fair consensus clustering, addressing the challenge of creating representative and equitable clusterings based on protected attributes.

Contribution

It presents novel algorithms for fair consensus clustering, including optimal solutions for equal group representation and approximation algorithms for unequal groups, along with NP-hardness results.

Findings

01

First to provide approximation algorithms for fair consensus clustering.

02

Developed an optimal algorithm for datasets with equal group representation.

03

Proved NP-hardness for the problem with unequal group sizes.

Abstract

Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best represents the collective structure of the data. In this work, we study this fundamental problem through the lens of fair clustering, as introduced by Chierichetti et al. [NeurIPS'17], which incorporates the disparate impact doctrine to ensure proportional representation of each protected group in the dataset within every cluster. Our objective is to find a consensus clustering that is not only representative but also fair with respect to specific protected attributes. To the best of our knowledge, we are the first to address this problem and provide a constant-factor approximation. As part of our investigation, we examine how to minimally modify an existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing · Privacy-Preserving Technologies in Data