Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Harit Vishwakarma; Reid (Yi) Chen; Sui Jiet Tay; Satya Sai Srinath; Namburi; Frederic Sala; Ramya Korlakai Vinayak

arXiv:2404.16188·cs.LG·April 26, 2024

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Harit Vishwakarma, Reid (Yi) Chen, Sui Jiet Tay, Satya Sai Srinath, Namburi, Frederic Sala, Ramya Korlakai Vinayak

PDF

Open Access 1 Video

TL;DR

This paper introduces olander, a novel confidence function designed to improve threshold-based auto-labeling by maximizing coverage and reliability, outperforming calibration methods significantly.

Contribution

The paper proposes olander, a new framework and post-hoc method for optimal confidence functions in auto-labeling, addressing overconfidence issues and enhancing performance.

Findings

01

olander achieves up to 60% higher coverage than baselines.

02

It maintains auto-labeling error below 5%.

03

Uses the same amount of labeled data as existing methods.

Abstract

Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling· slideslive

Taxonomy

TopicsPharmaceutical studies and practices · Biomedical Text Mining and Ontologies