Fair Bayesian Data Selection via Generalized Discrepancy Measures

Yixuan Zhang; Jiabin Luo; Zhenggang Wang; Feng Zhou; Quyu Kong

arXiv:2511.07032·cs.LG·November 11, 2025

Fair Bayesian Data Selection via Generalized Discrepancy Measures

Yixuan Zhang, Jiabin Luo, Zhenggang Wang, Feng Zhou, Quyu Kong

PDF

Open Access 1 Video

TL;DR

This paper introduces a Bayesian data selection method that enhances fairness in machine learning by aligning group-specific distributions with a central distribution using flexible discrepancy measures, improving fairness and accuracy.

Contribution

It proposes a novel, scalable, data-centric fairness framework using generalized discrepancy measures for aligning distributions without explicit fairness constraints.

Findings

01

Outperforms existing fairness methods in accuracy and fairness metrics.

02

Supports flexible distribution alignment via Wasserstein, MMD, and $f$-divergence.

03

Provides theoretical guarantees for fairness improvements.

Abstract

Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level, they often suffer from high computational costs, limited scalability, and poor generalization. To address these challenges, we propose a Bayesian data selection framework that ensures fairness by aligning group-specific posterior distributions of model parameters and sample weights with a shared central distribution. Our framework supports flexible alignment via various distributional discrepancy measures, including Wasserstein distance, maximum mean discrepancy, and $f$ -divergence, allowing geometry-aware control without imposing explicit fairness constraints. This data-centric approach mitigates group-specific biases in training data and improves fairness in downstream tasks, with theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fair Bayesian Data Selection via Generalized Discrepancy Measures· underline

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)