The Word is Mightier than the Label: Learning without Pointillistic   Labels using Data Programming

Chufan Gao; Mononito Goswami

arXiv:2108.10921·cs.LG·August 27, 2021

The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

Chufan Gao, Mononito Goswami

PDF

Open Access

TL;DR

This paper explores the Data Programming framework, which leverages noisy heuristics to generate labels for text classification, reducing reliance on manual point-by-point labeling and demonstrating competitive results.

Contribution

It provides a detailed analysis of Data Programming's mathematical foundations and empirically compares it with traditional active and semi-supervised learning methods.

Findings

01

DP effectively denoises heuristic labels for text classification

02

Compared to traditional methods, DP reduces labeling effort and maintains competitive accuracy

03

Demonstrates the applicability of DP on real-world text datasets

Abstract

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored the use of diverse sources of weak supervision to produce competitive end model classifiers. In this paper, we survey recent work on weak supervision, and in particular, we investigate the Data Programming (DP) framework. Taking a set of potentially noisy heuristics as input, DP assigns denoised probabilistic labels to each data point in a dataset using a probabilistic graphical model of heuristics. We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Imbalanced Data Classification Techniques