Efficiently Learning Structured Distributions from Untrusted Batches
Sitan Chen, Jerry Li, Ankur Moitra

TL;DR
This paper develops polynomial-time algorithms for learning distributions from untrusted batches, achieving near-optimal error rates and reducing sample complexity by incorporating prior distribution shape knowledge.
Contribution
It introduces a general sum-of-squares framework for robust distribution learning that handles complex constraints and prior knowledge, improving efficiency and sample complexity.
Findings
Algorithms approach the information-theoretic error bound.
Sample complexity is reduced to polylogarithmic in n for many distribution classes.
Framework can incorporate distribution shape constraints like VC theory.
Abstract
We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume users, all of whom have samples from some underlying distribution over . Each user sends a batch of i.i.d. samples from this distribution; however an -fraction of users are untrustworthy and can send adversarially chosen responses. The goal is then to learn in total variation distance. When this is the standard robust univariate density estimation setting and it is well-understood that error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when is large. Unfortunately, their algorithms run in time exponential in either or . We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning
MethodsTest
