ALdataset: a benchmark for pool-based active learning

Xueying Zhan; Antoni Bert Chan

arXiv:2010.08161·cs.LG·October 19, 2020

ALdataset: a benchmark for pool-based active learning

Xueying Zhan, Antoni Bert Chan

PDF

Open Access

TL;DR

This paper introduces ALdataset, a comprehensive benchmark for pool-based active learning, providing datasets and metrics to facilitate fair comparison and evaluation of different AL strategies, thereby advancing the field.

Contribution

It presents a standardized benchmark with datasets and metrics for evaluating pool-based active learning methods, addressing the lack of comparative evaluation tools.

Findings

01

Benchmarking datasets and metrics are established.

02

Experimental results compare various AL strategies.

03

Insights into the effectiveness of different methods are provided.

Abstract

Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points. Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain. Although many pool-based AL methods have been developed, the lack of a comparative benchmarking and integration of techniques makes it difficult to: 1) determine the current state-of-the-art technique; 2) evaluate the relative benefit of new methods for various properties of the dataset; 3) understand what specific problems merit greater attention; and 4) measure the progress of the field over time. To conduct easier comparative evaluation among AL methods, we present a benchmark task for pool-based active learning, which consists of benchmarking datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression