Distribution Learning with Valid Outputs Beyond the Worst-Case

Nick Rittler; Kamalika Chaudhuri

arXiv:2410.16253·cs.LG·October 22, 2024

Distribution Learning with Valid Outputs Beyond the Worst-Case

Nick Rittler, Kamalika Chaudhuri

PDF

Open Access 1 Video

TL;DR

This paper explores conditions under which distribution learning can reliably produce valid outputs with fewer queries than worst-case scenarios, focusing on data within certain model classes and VC-classes.

Contribution

It characterizes regimes where guaranteeing validity is easier than worst-case, showing sample complexity depends weakly on validity constraints and that limited queries suffice for VC-classes.

Findings

01

Sample complexity weakly depends on validity constraints.

02

Limited validity queries are often sufficient for VC-classes.

03

Guaranteeing validity is easier under certain distribution and model assumptions.

Abstract

Generative models at times produce "invalid" outputs, such as images with generation artifacts and unnatural sounds. Validity-constrained distribution learning attempts to address this problem by requiring that the learned distribution have a provably small fraction of its mass in invalid parts of space -- something which standard loss minimization does not always ensure. To this end, a learner in this model can guide the learning via "validity queries", which allow it to ascertain the validity of individual examples. Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which -- while generating guarantees in a wide-range of settings -- makes an atypical polynomial number of validity queries. In this work, we take a first step towards characterizing regimes where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Distribution Learning with Valid Outputs Beyond the Worst-Case· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning