Regularized Data Programming with Automated Bayesian Prior Selection
Jacqueline R. M. A. Maasch, Hao Zhang, Qian Yang, Fei Wang, Volodymyr, Kuleshov

TL;DR
This paper introduces a Bayesian regularization approach to data programming that automates prior selection, improving dataset labeling accuracy especially in low-data scenarios, and enhances interpretability over traditional methods.
Contribution
It proposes a Bayesian extension of data programming with automated prior selection using majority vote as a proxy, addressing failures of unsupervised learning.
Findings
Regularized DP outperforms maximum likelihood and majority voting.
Improves performance in low-data regimes.
Enhances interpretability of the labeling process.
Abstract
The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Bandit Algorithms Research
