Learning from Multiple Noisy Partial Labelers

Peilin Yu; Tiffany Ding; Stephen H. Bach

arXiv:2106.04530·cs.LG·March 28, 2022·1 cites

Learning from Multiple Noisy Partial Labelers

Peilin Yu, Tiffany Ding, Stephen H. Bach

PDF

Open Access 2 Repos

TL;DR

This paper introduces a probabilistic model for programmatic weak supervision that incorporates partial labelers outputting subsets of classes, significantly improving accuracy and scalability in text and image classification tasks.

Contribution

It extends existing frameworks to handle partial labelers, providing a scalable learning algorithm and theoretical guarantees of identifiability.

Findings

01

Adding partial labels improves text classification accuracy by 8.6 percentage points.

02

The framework scales to 100k examples in one minute, a 300x speedup.

03

Achieves zero-shot object classification performance comparable to recent embedding-based methods.

Abstract

Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to create partial labelers that output subsets of possible class labels would greatly expand the expressivity of programmatic weak supervision. We introduce this capability by defining a probabilistic generative model that can estimate the underlying accuracies of multiple noisy partial labelers without ground truth labels. We show how to scale up learning, for example learning on 100k examples in one minute, a 300x speed up compared to a naive implementation. We also prove that this class of models is generically identifiable up to label swapping under mild conditions. We evaluate our framework on three text classification and six object classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Topic Modeling