Batch Active Learning at Scale

Gui Citovsky; Giulia DeSalvo; Claudio Gentile; Lazaros Karydas; Anand; Rajagopalan; Afshin Rostamizadeh; Sanjiv Kumar

arXiv:2107.14263·cs.LG·August 2, 2021·47 cites

Batch Active Learning at Scale

Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand, Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents a scalable batch active learning algorithm that combines uncertainty and diversity, enabling efficient training with very large batch sizes and providing theoretical label complexity guarantees.

Contribution

It introduces a scalable sampling method for batch active learning that handles very large batch sizes and offers theoretical label complexity guarantees.

Findings

01

Scales to batch sizes of 100K-1M, much larger than previous methods.

02

Achieves significant improvements in training efficiency over recent baselines.

03

Provides initial theoretical analysis with label complexity guarantees.

Abstract

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

airi-institute/al_toolbox
pytorch

Videos

Batch Active Learning at Scale· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Teaching and Learning Programming · Robot Manipulation and Learning