Learning from Multiple Noisy Partial Labelers
Peilin Yu, Tiffany Ding, Stephen H. Bach

TL;DR
This paper introduces a probabilistic model for programmatic weak supervision that incorporates partial labelers outputting subsets of classes, significantly improving accuracy and scalability in text and image classification tasks.
Contribution
It extends existing frameworks to handle partial labelers, providing a scalable learning algorithm and theoretical guarantees of identifiability.
Findings
Adding partial labels improves text classification accuracy by 8.6 percentage points.
The framework scales to 100k examples in one minute, a 300x speedup.
Achieves zero-shot object classification performance comparable to recent embedding-based methods.
Abstract
Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to create partial labelers that output subsets of possible class labels would greatly expand the expressivity of programmatic weak supervision. We introduce this capability by defining a probabilistic generative model that can estimate the underlying accuracies of multiple noisy partial labelers without ground truth labels. We show how to scale up learning, for example learning on 100k examples in one minute, a 300x speed up compared to a naive implementation. We also prove that this class of models is generically identifiable up to label swapping under mild conditions. We evaluate our framework on three text classification and six object classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Topic Modeling
