Distribution Learning with Valid Outputs Beyond the Worst-Case
Nick Rittler, Kamalika Chaudhuri

TL;DR
This paper explores conditions under which distribution learning can reliably produce valid outputs with fewer queries than worst-case scenarios, focusing on data within certain model classes and VC-classes.
Contribution
It characterizes regimes where guaranteeing validity is easier than worst-case, showing sample complexity depends weakly on validity constraints and that limited queries suffice for VC-classes.
Findings
Sample complexity weakly depends on validity constraints.
Limited validity queries are often sufficient for VC-classes.
Guaranteeing validity is easier under certain distribution and model assumptions.
Abstract
Generative models at times produce "invalid" outputs, such as images with generation artifacts and unnatural sounds. Validity-constrained distribution learning attempts to address this problem by requiring that the learned distribution have a provably small fraction of its mass in invalid parts of space -- something which standard loss minimization does not always ensure. To this end, a learner in this model can guide the learning via "validity queries", which allow it to ascertain the validity of individual examples. Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which -- while generating guarantees in a wide-range of settings -- makes an atypical polynomial number of validity queries. In this work, we take a first step towards characterizing regimes where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
