Diversity Enhanced Active Learning with Strictly Proper Scoring Rules

Wei Tan; Lan Du; Wray Buntine

arXiv:2110.14171·cs.LG·October 28, 2021·5 cites

Diversity Enhanced Active Learning with Strictly Proper Scoring Rules

Wei Tan, Lan Du, Wray Buntine

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel active learning acquisition function based on proper scoring rules, demonstrating improved robustness and performance in text classification tasks through theoretical analysis and extensive experiments.

Contribution

It proposes BEMPS, a new acquisition function using proper scores, along with a diversity-promoting batch selection method, enhancing active learning for text classification.

Findings

01

BEMPS outperforms other acquisition functions in experiments.

02

Proper scoring rules lead to more robust active learning.

03

Diversity in batch selection improves learning efficiency.

Abstract

We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log probability or negative mean square error, which we call Bayesian Estimate of Mean Proper Scores (BEMPS). We also prove convergence results borrowing techniques used with MOCU. In order to allow better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm, which encourages diversity in the vector of expected changes in scores for unlabelled data. To allow high performance text classifiers, we combine ensembling and dynamic validation set construction on pretrained language models. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidtw999/bemps
pytorchOfficial

Videos

Diversity Enhanced Active Learning with Strictly Proper Scoring Rules· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Natural Language Processing Techniques · Topic Modeling

MethodsEarly Learning Regularization