Learning the Structure of Generative Models without Labeled Data
Stephen H. Bach, Bryan He, Alexander Ratner, Christopher R\'e

TL;DR
This paper introduces a new method for automatically estimating the dependency structure of generative models used in weak supervision, improving efficiency and accuracy without labeled data.
Contribution
It proposes an $oldsymbol{ ext{l}_1}$-regularized pseudolikelihood approach for structure learning that is faster and more precise than existing methods, requiring less data.
Findings
Method is 100× faster than maximum likelihood approaches.
Selects fewer extraneous dependencies, reducing false positives.
Improves F1 score by 1.5 points on real-world data.
Abstract
Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the -regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100 faster than a maximum likelihood approach and selects as many extraneous dependencies. We also show that our method provides an average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Machine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis
