ULF: Unsupervised Labeling Function Correction using Cross-Validation   for Weak Supervision

Anastasiia Sedova; Benjamin Roth

arXiv:2204.06863·cs.LG·January 5, 2024

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

Anastasiia Sedova, Benjamin Roth

PDF

Open Access 1 Repo

TL;DR

ULF is an innovative unsupervised method that improves weak supervision by correcting labeling functions through cross-validation, leading to more accurate data annotations without manual effort.

Contribution

The paper introduces ULF, a novel algorithm that denoises weak supervision data by correcting labeling functions using cross-validation, enhancing label quality without manual labeling.

Findings

01

ULF improves weak supervision accuracy across multiple datasets.

02

ULF effectively corrects biases in labeling functions.

03

Enhanced data quality leads to better model performance.

Abstract

A cost-effective alternative to manual data labeling is weak supervision (WS), where data samples are automatically annotated using a predefined set of labeling functions (LFs), rule-based mechanisms that generate artificial labels for the associated classes. In this work, we investigate noise reduction techniques for WS based on the principle of k-fold cross-validation. We introduce a new algorithm ULF for Unsupervised Labeling Function correction, which denoises WS data by leveraging models trained on all but some LFs to identify and correct biases specific to the held-out LFs. Specifically, ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples. Evaluation on multiple datasets confirms ULF's effectiveness in enhancing WS learning without the need for manual labeling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

knodle/knodle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Rough Sets and Fuzzy Logic · Music and Audio Processing