Relative Error Fair Clustering in the Weak-Strong Oracle Model

Vladimir Braverman; Prathamesh Dharangutte; Shaofeng H.-C. Jiang; Hoai-An Nguyen; Chen Wang; Yubo Zhang; Samson Zhou

arXiv:2506.12287·cs.DS·December 22, 2025

Relative Error Fair Clustering in the Weak-Strong Oracle Model

Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H.-C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel approach for fair clustering that balances the use of costly accurate distance measurements and cheaper inaccurate ones, achieving near-optimal solutions with minimal strong oracle queries.

Contribution

It presents the first $(1+ ext{varepsilon})$-coresets for fair $k$-median clustering using polylogarithmic strong oracle queries, advancing fair clustering with inaccurate data.

Findings

01

Achieved $(1+ ext{varepsilon})$-coresets for fair $k$-median clustering.

02

Extended results to standard and $(k,z)$-clustering without fairness constraints.

03

Reduced the number of strong oracle queries needed for near-optimal clustering.

Abstract

We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness with the presence of inaccurate information. We achieve the first $(1 + ε)$ -coresets for fair $k$ -median clustering using $poly (\frac{k}{ε} \cdot lo g n)$ queries to the strong oracle. Furthermore, our results imply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Relative Error Fair Clustering in the Weak-Strong Oracle Model· slideslive

Taxonomy

TopicsData Mining Algorithms and Applications · Bayesian Modeling and Causal Inference · Data Stream Mining Techniques