Learning Populations of Parameters
Kevin Tian, Weihao Kong, Gregory Valiant

TL;DR
This paper introduces an optimal method for estimating the distribution of parameters across a population from binomial observations, achieving the best possible accuracy and extending to multi-dimensional cases.
Contribution
It presents a novel estimation technique that surpasses empirical methods, achieving information-theoretic optimality in recovering parameter histograms, including multi-dimensional parameters.
Findings
Achieves $O(1/t)$ error in histogram recovery, optimal by information theory.
Extends the method to multi-dimensional parameter settings.
Demonstrates practical effectiveness on diverse real-world datasets.
Abstract
Consider the following estimation problem: there are entities, each with an unknown parameter , and we observe independent random variables, , with Binomial. How accurately can one recover the "histogram" (i.e. cumulative density function) of the 's? While the empirical estimates would recover the histogram to earth mover distance (equivalently, distance between the CDFs), we show that, provided is sufficiently large, we can achieve error which is information theoretically optimal. We also extend our results to the multi-dimensional parameter case, capturing settings where each member of the population has multiple associated parameters. Beyond the theoretical results, we demonstrate that the recovery algorithm performs well in practice on a variety of datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Machine Learning and Data Classification
