Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming
Ayush Maheshwari, Krishnateja Killamsetty, Ganesh Ramakrishnan,, Rishabh Iyer, Marina Danilevsky, Lucian Popa

TL;DR
This paper introduces a semi-supervised learning framework that reweights labeling functions using a joint model and bi-level optimization, significantly improving data programming performance in text classification.
Contribution
It proposes a novel reweighting framework that leverages both labeled and unlabeled data to improve the quality of labeling functions in data programming.
Findings
Outperforms prior methods on multiple text classification datasets
Effectively reweights noisy labeling functions for better semi-supervised learning
Demonstrates robustness against noisy and limited labeled data
Abstract
A critical bottleneck in supervised machine learning is the need for large amounts of labeled data which is expensive and time consuming to obtain. However, it has been shown that a small amount of labeled data, while insufficient to re-train a model, can be effectively used to generate human-interpretable labeling functions (LFs). These LFs, in turn, have been used to generate a large amount of additional noisy labeled data, in a paradigm that is now commonly referred to as data programming. However, previous approaches to automatically generate LFs make no attempt to further use the given labeled data for model training, thus giving up opportunities for improved performance. Moreover, since the LFs are generated from a relatively small labeled dataset, they are prone to being noisy, and naively aggregating these LFs can lead to very poor performance in practice. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Topic Modeling
