CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label   Uncertainties (Technical Report)

Yinjun Wu; James Weimer; Susan B. Davidson

arXiv:2107.08588·cs.DB·July 27, 2021

CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties (Technical Report)

Yinjun Wu, James Weimer, Susan B. Davidson

PDF

1 Repo

TL;DR

CHEF is a cost-effective, fast pipeline for cleaning weak labels in machine learning, using influence-based prioritization and incremental updates to improve model performance efficiently.

Contribution

The paper introduces CHEF, a novel label cleaning pipeline that reduces costs and accelerates the process through influence prioritization and incremental model updates.

Findings

01

Significant speed-ups in label cleaning process.

02

Maintains high model prediction performance.

03

Reduces overall annotation costs.

Abstract

High-quality labels are expensive to obtain for many machine learning tasks, such as medical image classification tasks. Therefore, probabilistic (weak) labels produced by weak supervision tools are used to seed a process in which influential samples with weak labels are identified and cleaned by several human annotators to improve the model performance. To lower the overall cost and computational overhead of this process, we propose a solution called CHEF (CHEap and Fast label cleaning), which consists of the following three components. First, to reduce the cost of human annotators, we use Infl, which prioritizes the most influential training samples for cleaning and provides cleaned labels to save the cost of one human annotator. Second, to accelerate the sample selector phase and the model constructor phase, we use Increm-Infl to incrementally produce influential samples, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thuwuyinjun/Chef
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.