Data Consistency for Weakly Supervised Learning
Chidubem Arachie, Bert Huang

TL;DR
This paper introduces a data consistent weak supervision method that effectively combines noisy weak signals and data features to produce accurate labels, improving performance on classification tasks without assuming joint distributions.
Contribution
The paper presents a novel weak supervision algorithm that leverages data consistency and feature information to handle noisy labels without distribution assumptions.
Findings
Outperforms state-of-the-art weak supervision methods on text classification
Effective in both image and text classification tasks
Handles low or no coverage weak signals successfully
Abstract
In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals, while also considering features of the training data to produce accurate labels for training. Our method searches over classifiers of the data representation to find plausible labelings. We call this paradigm data consistent weak supervision. A key facet of our framework is that we are able to estimate labels for data examples low or no coverage from the weak supervision. In addition, we make no assumptions about the joint distribution of the weak signals and true labels of the data. Instead, we use weak signals and the data features to solve a constrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Music and Audio Processing · Anomaly Detection Techniques and Applications
