Fair Bayesian Model-Based Clustering

Jihu Lee; Kunwoong Kim; Yongdai Kim

arXiv:2506.12839·stat.ML·June 17, 2025

Fair Bayesian Model-Based Clustering

Jihu Lee, Kunwoong Kim, Yongdai Kim

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Fair Bayesian Clustering (FBC), a novel model-based approach that infers the number of clusters and ensures group fairness, applicable to various data types with improved fairness and utility trade-offs.

Contribution

The paper proposes a new Bayesian clustering method with a fairness prior, capable of inferring the number of clusters and handling diverse data types.

Findings

01

FBC reasonably infers the number of clusters.

02

FBC achieves a competitive utility-fairness trade-off.

03

FBC performs well on categorical data.

Abstract

Fair clustering has become a socially significant task with the advancement of machine learning technologies and the growing demand for trustworthy AI. Group fairness ensures that the proportions of each sensitive group are similar in all clusters. Most existing group-fair clustering methods are based on the $K$ -means clustering and thus require the distance between instances and the number of clusters to be given in advance. To resolve this limitation, we propose a fair Bayesian model-based clustering called Fair Bayesian Clustering (FBC). We develop a specially designed prior which puts its mass only on fair clusters, and implement an efficient MCMC algorithm. Advantages of FBC are that it can infer the number of clusters and can be applied to any data type as long as the likelihood is defined (e.g., categorical data). Experiments on real-world datasets show that FBC (i) reasonably…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper is among the first to integrate group fairness constraints into a Bayesian mixture modeling framework (an existing model for clustering without fairness constraints), providing new insights into fairness-aware clustering and Bayesian non-parameter. 2. The proposed fair prior through the matching map is sound, which allows inference via standard Bayesian sampling tools without strict constraints on parameters. 3. Unlike prior fair clustering methods (which are typically combined w

Weaknesses

1. The matching based ideas have also been used in fairlet and fair $k$-means methods, and here it is mainly reformulated as a Bayesian prior combined with MCMC. Although the paper claims to be the first to apply this idea in a Bayesian clustering setting, the core mechanism remains very similar to previous approaches, making it difficult to identify clear Bayesian advantages beyond inferring the number of clusters. 2. This paper lacks a comprehensive comparison with previous fair clustering mo

Reviewer 02Rating 6Confidence 3

Strengths

The paper tackles a well-known and practical shortcoming of the dominant K-means paradigm in fair clustering. The inflexibility of fixed $k$ is a major hurdle in real-world applications. The move to a Bayesian mixture model is also a natural and powerful way to solve these problems. The ability to infer $k$ is a significant advantage for exploratory analysis, and the model-based framework is inherently more adaptable to diverse data types than distance-based algorithms. The idea of a "fair prio

Weaknesses

Despite the earlier claim that the paper focuses on the "perfect fairness" scenario, in section 2.1.2., when $n_1 = \beta n_0 + r$, the fairness constraint can actually be moderately violated. I think the constraint violation should be further quantified in the main body. Although the Bayesian fair prior and posterior update framework is new, the idea of "coupling" points from different sensitivity groups follows the tradition in the fair clustering community, but is more difficult to follow in

Reviewer 03Rating 4Confidence 3

Strengths

### Compelling Theoretical Model + Results - The connection between perfect group fairness and a matching map is the paper's strongest point. This is a very interesting reformulation of the fairness constraint, though I have some questions / concerns on this connection (see the following sections). - To my knowledge this if the first work to take the Bayesian non-parametric approach to group fairness, which is. a clear and important contribution. ### Experimental Results - The model is demonstr

Weaknesses

### Model Issues - The authors note early in the paper that "most existing fair clustering methods are based on the K-means clustering and thus require the distance between instances and the number of clusters to be given in advance," but the matching map and "energy" is directly a function of distances between matched points. It seems contradictory to re-introduce such a major dependency when the paper claims to be avoiding the limitations of $k$-means. If a reasonable distance is difficult to

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research