Dependent randomized rounding for clustering and partition systems with knapsack constraints
David G. Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

TL;DR
This paper introduces new randomized algorithms for clustering problems with knapsack constraints, addressing fairness and other considerations, and provides novel tail bounds for sums of unbounded variance variables.
Contribution
It develops a dependent randomized rounding technique for clustering with knapsack and partition constraints, including multi-knapsack median and center problems, with improved approximation guarantees.
Findings
New approximation algorithms for multi-knapsack median and center.
A novel tail bound for sums of random variables with unbounded variances.
Enhanced understanding of clustering under fairness and resource constraints.
Abstract
Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds can be useful in inferring properties of large networks using few samples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Random Matrices and Applications · Privacy-Preserving Technologies in Data
