Data Consistency for Weakly Supervised Learning

Chidubem Arachie; Bert Huang

arXiv:2202.03987·cs.LG·February 9, 2022·1 cites

Data Consistency for Weakly Supervised Learning

Chidubem Arachie, Bert Huang

PDF

Open Access

TL;DR

This paper introduces a data consistent weak supervision method that effectively combines noisy weak signals and data features to produce accurate labels, improving performance on classification tasks without assuming joint distributions.

Contribution

The paper presents a novel weak supervision algorithm that leverages data consistency and feature information to handle noisy labels without distribution assumptions.

Findings

01

Outperforms state-of-the-art weak supervision methods on text classification

02

Effective in both image and text classification tasks

03

Handles low or no coverage weak signals successfully

Abstract

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals, while also considering features of the training data to produce accurate labels for training. Our method searches over classifiers of the data representation to find plausible labelings. We call this paradigm data consistent weak supervision. A key facet of our framework is that we are able to estimate labels for data examples low or no coverage from the weak supervision. In addition, we make no assumptions about the joint distribution of the weak signals and true labels of the data. Instead, we use weak signals and the data features to solve a constrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Music and Audio Processing · Anomaly Detection Techniques and Applications