Learning to Robustly Aggregate Labeling Functions for Semi-supervised   Data Programming

Ayush Maheshwari; Krishnateja Killamsetty; Ganesh Ramakrishnan,; Rishabh Iyer; Marina Danilevsky; Lucian Popa

arXiv:2109.11410·cs.LG·March 11, 2022

Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming

Ayush Maheshwari, Krishnateja Killamsetty, Ganesh Ramakrishnan,, Rishabh Iyer, Marina Danilevsky, Lucian Popa

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semi-supervised learning framework that reweights labeling functions using a joint model and bi-level optimization, significantly improving data programming performance in text classification.

Contribution

It proposes a novel reweighting framework that leverages both labeled and unlabeled data to improve the quality of labeling functions in data programming.

Findings

01

Outperforms prior methods on multiple text classification datasets

02

Effectively reweights noisy labeling functions for better semi-supervised learning

03

Demonstrates robustness against noisy and limited labeled data

Abstract

A critical bottleneck in supervised machine learning is the need for large amounts of labeled data which is expensive and time consuming to obtain. However, it has been shown that a small amount of labeled data, while insufficient to re-train a model, can be effectively used to generate human-interpretable labeling functions (LFs). These LFs, in turn, have been used to generate a large amount of additional noisy labeled data, in a paradigm that is now commonly referred to as data programming. However, previous approaches to automatically generate LFs make no attempt to further use the given labeled data for model training, thus giving up opportunities for improved performance. Moreover, since the LFs are generated from a relatively small labeled dataset, they are prone to being noisy, and naively aggregating these LFs can lead to very poor performance in practice. In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ayushbits/robust-aggregate-lfs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Topic Modeling