Data Amplification: Instance-Optimal Property Estimation

Yi Hao; Alon Orlitsky

arXiv:1903.01432·math.ST·March 6, 2019·5 cites

Data Amplification: Instance-Optimal Property Estimation

Yi Hao, Alon Orlitsky

PDF

Open Access 1 Video

TL;DR

This paper introduces novel, efficient estimators that significantly improve the accuracy of property estimation in distributions, effectively amplifying data without additional samples, and outperforming previous methods across various properties.

Contribution

The paper presents linear-time estimators that amplify data effectively, achieving near-optimal accuracy for multiple distribution properties across all underlying distributions.

Findings

01

Estimators achieve accuracy with n samples comparable to empirical estimators with n log n samples.

02

New estimators outperform previous state-of-the-art methods across various properties.

03

Amplification factors are proven to be optimal.

Abstract

The best-known and most commonly used distribution-property estimation technique uses a plug-in estimator, with empirical frequency replacing the underlying distribution. We present novel linear-time-computable estimators that significantly "amplify" the effective amount of data available. For a large variety of distribution properties including four of the most popular ones and for every underlying distribution, they achieve the accuracy that the empirical-frequency plug-in estimators would attain using a logarithmic-factor more samples. Specifically, for Shannon entropy and a very broad class of properties including $ℓ_{1}$ -distance, the new estimators use $n$ samples to achieve the accuracy attained by the empirical estimators with $n lo g n$ samples. For support-size and coverage, the new estimators use $n$ samples to achieve the performance of empirical frequency with sample size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Data Amplification: Instance-Optimal Property Estimation· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Bayesian Modeling and Causal Inference · Algorithms and Data Compression