Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights
Ameet Gadekar, Aristides Gionis, Suhas Thejaswi

TL;DR
This paper introduces improved approximation algorithms for fair data summarization modeled as the fair $k$-supplier problem, providing theoretical guarantees and demonstrating scalability on large datasets.
Contribution
It presents 3-approximation algorithms for both disjoint and overlapping group variants, improving upon the previous factor of 5, with polynomial and fixed-parameter tractable runtimes.
Findings
Algorithms achieve tight approximation bounds under standard complexity assumptions.
Scalability demonstrated on large synthetic datasets.
Fairness constraints impact solution quality, analyzed on real-world data.
Abstract
Data summarization tasks are often modeled as -clustering problems, where the goal is to choose data points, called cluster centers, that best represent the dataset by minimizing a clustering objective. A popular objective is to minimize the maximum distance between any data point and its nearest center, which is formalized as the -center problem. While in some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a predefined subset of points, referred as facilities or suppliers; this is known as the -supplier problem. In this work, we focus on fair data summarization modeled as the fair -supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group while minimizing the -supplier objective. The groups can be disjoint or overlapping, leading to two distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data Mining Algorithms and Applications · Data Quality and Management
MethodsFocus
