A Cross-Domain Benchmark for Active Learning

Thorben Werner; Johannes Burchert; Maximilian Stubbemann; Lars; Schmidt-Thieme

arXiv:2408.00426·cs.LG·November 13, 2024

A Cross-Domain Benchmark for Active Learning

Thorben Werner, Johannes Burchert, Maximilian Stubbemann, Lars, Schmidt-Thieme

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CDALBench, a comprehensive cross-domain active learning benchmark with extensive repetitions, revealing that method performance varies significantly across domains and runs, emphasizing the need for robust evaluation.

Contribution

It presents the first cross-domain active learning benchmark with high-repetition evaluation, highlighting the importance of domain diversity and multiple runs for reliable AL method assessment.

Findings

01

Method performance varies across different domains.

02

High number of repetitions is crucial for reliable evaluation.

03

Performance of established methods can vary dramatically depending on the seed.

Abstract

Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose CDALBench, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, CDALBench can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wernerth94/a-cross-domain-benchmark-for-active-learning
pytorchOfficial

Videos

A Cross-Domain Benchmark for Active Learning· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning