Regularized Data Programming with Automated Bayesian Prior Selection

Jacqueline R. M. A. Maasch; Hao Zhang; Qian Yang; Fei Wang; Volodymyr; Kuleshov

arXiv:2210.08677·cs.LG·October 26, 2023

Regularized Data Programming with Automated Bayesian Prior Selection

Jacqueline R. M. A. Maasch, Hao Zhang, Qian Yang, Fei Wang, Volodymyr, Kuleshov

PDF

Open Access

TL;DR

This paper introduces a Bayesian regularization approach to data programming that automates prior selection, improving dataset labeling accuracy especially in low-data scenarios, and enhances interpretability over traditional methods.

Contribution

It proposes a Bayesian extension of data programming with automated prior selection using majority vote as a proxy, addressing failures of unsupervised learning.

Findings

01

Regularized DP outperforms maximum likelihood and majority voting.

02

Improves performance in low-data regimes.

03

Enhances interpretability of the labeling process.

Abstract

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Bandit Algorithms Research