Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

Ver\'onica \'Alvarez; Santiago Mazuelas; Steven An; and Sanjoy Dasgupta

arXiv:2508.03896·stat.ML·August 7, 2025

Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

Ver\'onica \'Alvarez, Santiago Mazuelas, Steven An, and Sanjoy Dasgupta

PDF

TL;DR

This paper introduces a new methodology for programmatic weak supervision that provides confidence intervals for label probabilities, improving the reliability of label predictions by accounting for uncertainties in weak labeling functions.

Contribution

It presents a novel approach using uncertainty sets to generate confidence intervals for label probabilities, enhancing the reliability of weak supervision methods.

Findings

01

Improves prediction reliability over state-of-the-art methods.

02

Provides practical confidence intervals for label probabilities.

03

Demonstrates effectiveness on multiple benchmark datasets.

Abstract

The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that provide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. Furthermore, existing techniques for programmatic weak supervision cannot provide assessments for the reliability of the probabilistic predictions for labels. This paper presents a methodology for programmatic weak supervision that can provide confidence intervals for label probabilities and obtain more reliable predictions. In particular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestricted behavior and typology.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.