ALdataset: a benchmark for pool-based active learning
Xueying Zhan, Antoni Bert Chan

TL;DR
This paper introduces ALdataset, a comprehensive benchmark for pool-based active learning, providing datasets and metrics to facilitate fair comparison and evaluation of different AL strategies, thereby advancing the field.
Contribution
It presents a standardized benchmark with datasets and metrics for evaluating pool-based active learning methods, addressing the lack of comparative evaluation tools.
Findings
Benchmarking datasets and metrics are established.
Experimental results compare various AL strategies.
Insights into the effectiveness of different methods are provided.
Abstract
Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points. Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain. Although many pool-based AL methods have been developed, the lack of a comparative benchmarking and integration of techniques makes it difficult to: 1) determine the current state-of-the-art technique; 2) evaluate the relative benefit of new methods for various properties of the dataset; 3) understand what specific problems merit greater attention; and 4) measure the progress of the field over time. To conduct easier comparative evaluation among AL methods, we present a benchmark task for pool-based active learning, which consists of benchmarking datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression
