Weak Supervision with Incremental Source Accuracy Estimation
Richard Gresham Correro

TL;DR
This paper presents an incremental method to estimate the dependency structure and accuracy of weak supervision sources in real-time, enabling dynamic label generation for streaming data with accuracy comparable to offline methods.
Contribution
It introduces a novel incremental approach for dependency and accuracy estimation of weak supervision sources, suitable for real-time data labeling.
Findings
Achieves probabilistic labels with accuracy comparable to offline methods.
Works with both classification models and heuristic functions as sources.
Effectively updates source accuracy estimates as new data arrives.
Abstract
Motivated by the desire to generate labels for real-time data we develop a method to estimate the dependency structure and accuracy of weak supervision sources incrementally. Our method first estimates the dependency structure associated with the supervision sources and then uses this to iteratively update the estimated source accuracies as new data is received. Using both off-the-shelf classification models trained using publicly-available datasets and heuristic functions as supervision sources we show that our method generates probabilistic labels with an accuracy matching that of existing off-line methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Topic Modeling
