# Deep Active Learning with Adaptive Acquisition

**Authors:** Manuel Haussmann, Fred A. Hamprecht, Melih Kandemir

arXiv: 1906.11471 · 2019-06-28

## TL;DR

This paper introduces an adaptive active learning method that learns to select data points for labeling by training a predictor with reinforcement feedback, outperforming fixed heuristics across multiple datasets.

## Contribution

It proposes a novel reinforcement learning framework for adaptive acquisition function selection in active learning, addressing the lack of a universally best heuristic.

## Key findings

- Always finds a superior acquisition strategy or adapts to the best heuristic.
- Outperforms fixed heuristics on three benchmark datasets.
- Demonstrates effectiveness in scarce data regimes.

## Abstract

Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning problem are fixed, the model is selected by grid search on a held-out validation set. This is strictly inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is already exhausted. More importantly, none of the earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out as the unique best choice. We present a method to break this vicious circle by defining the acquisition function as a learning predictor and training it by reinforcement feedback collected from each labeling round. As active learning is a scarce data regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all heuristics would agree, and learn a policy to warp the top portion of this ranking in the most beneficial way for the character of a specific data distribution. Our system consists of a Bayesian neural net, the predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark data sets that our method always manages to either invent a new superior acquisition function or to adapt itself to the a priori unknown best performing heuristic for each specific data set.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.11471/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1906.11471/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1906.11471/full.md

---
Source: https://tomesphere.com/paper/1906.11471