Efficiently Learning Structured Distributions from Untrusted Batches

Sitan Chen; Jerry Li; Ankur Moitra

arXiv:1911.02035·cs.DS·November 7, 2019

Efficiently Learning Structured Distributions from Untrusted Batches

Sitan Chen, Jerry Li, Ankur Moitra

PDF

Open Access

TL;DR

This paper develops polynomial-time algorithms for learning distributions from untrusted batches, achieving near-optimal error rates and reducing sample complexity by incorporating prior distribution shape knowledge.

Contribution

It introduces a general sum-of-squares framework for robust distribution learning that handles complex constraints and prior knowledge, improving efficiency and sample complexity.

Findings

01

Algorithms approach the information-theoretic error bound.

02

Sample complexity is reduced to polylogarithmic in n for many distribution classes.

03

Framework can incorporate distribution shape constraints like VC theory.

Abstract

We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume $m$ users, all of whom have samples from some underlying distribution $p$ over $1, \dots, n$ . Each user sends a batch of $k$ i.i.d. samples from this distribution; however an $ϵ$ -fraction of users are untrustworthy and can send adversarially chosen responses. The goal is then to learn $p$ in total variation distance. When $k = 1$ this is the standard robust univariate density estimation setting and it is well-understood that $Ω (ϵ)$ error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when $k$ is large. Unfortunately, their algorithms run in time exponential in either $n$ or $k$ . We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning

MethodsTest