Deep Bayesian Active Learning for Natural Language Processing: Results   of a Large-Scale Empirical Study

Aditya Siddhant; Zachary C. Lipton

arXiv:1808.05697·cs.CL·September 25, 2018

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Aditya Siddhant, Zachary C. Lipton

PDF

TL;DR

This paper presents a comprehensive large-scale empirical study on deep Bayesian active learning in NLP, demonstrating that Bayesian methods with uncertainty estimates consistently outperform traditional approaches across various tasks and datasets.

Contribution

It provides the first extensive empirical evaluation of Bayesian active learning methods in NLP, comparing multiple models, datasets, and acquisition functions.

Findings

01

Bayesian active learning improves over i.i.d. baselines.

02

Uncertainty sampling with Dropout or Bayes-by-Backprop outperforms classic methods.

03

Bayesian disagreement-based methods are effective across tasks.

Abstract

Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.