TL;DR
This paper introduces a convex optimization approach for assigning weights to data samples to achieve a representative distribution, including subset selection, with an open-source implementation tested on CDC data.
Contribution
It formulates the sample weighting problem as a convex optimization, enabling efficient solutions and including subset selection as a special case.
Findings
Convex optimization effectively finds representative sample weights.
Heuristic methods perform well for subset selection.
Open-source tool rsw demonstrates practical applicability.
Abstract
We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
