Learning Populations of Parameters

Kevin Tian; Weihao Kong; Gregory Valiant

arXiv:1709.02707·cs.LG·November 23, 2017·5 cites

Learning Populations of Parameters

Kevin Tian, Weihao Kong, Gregory Valiant

PDF

Open Access

TL;DR

This paper introduces an optimal method for estimating the distribution of parameters across a population from binomial observations, achieving the best possible accuracy and extending to multi-dimensional cases.

Contribution

It presents a novel estimation technique that surpasses empirical methods, achieving information-theoretic optimality in recovering parameter histograms, including multi-dimensional parameters.

Findings

01

Achieves $O(1/t)$ error in histogram recovery, optimal by information theory.

02

Extends the method to multi-dimensional parameter settings.

03

Demonstrates practical effectiveness on diverse real-world datasets.

Abstract

Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_{i} \in [0, 1]$ , and we observe $n$ independent random variables, $X_{1}, \dots, X_{n}$ , with $X_{i} \sim$ Binomial $(t, p_{i})$ . How accurately can one recover the "histogram" (i.e. cumulative density function) of the $p_{i}$ 's? While the empirical estimates would recover the histogram to earth mover distance $Θ (\frac{1}{t})$ (equivalently, $ℓ_{1}$ distance between the CDFs), we show that, provided $n$ is sufficiently large, we can achieve error $O (\frac{1}{t})$ which is information theoretically optimal. We also extend our results to the multi-dimensional parameter case, capturing settings where each member of the population has multiple associated parameters. Beyond the theoretical results, we demonstrate that the recovery algorithm performs well in practice on a variety of datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Machine Learning and Data Classification