On Statistical Bias In Active Learning: How and When To Fix It
Sebastian Farquhar, Yarin Gal, Tom Rainforth

TL;DR
This paper analyzes the bias introduced by active learning, formalizes its effects, and proposes novel correction methods, revealing when bias removal improves model training, especially for neural networks with limited data.
Contribution
It formalizes the bias in active learning, investigates its effects, and introduces corrective weights to mitigate bias when beneficial, explaining empirical successes.
Findings
Bias can be harmful or helpful depending on the context.
Corrective weights can effectively remove bias when needed.
Bias may aid training overparameterized models with limited data.
Abstract
Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution. We formalize this bias and investigate the situations in which it can be harmful and sometimes even helpful. We further introduce novel corrective weights to remove bias when doing so is beneficial. Through this, our work not only provides a useful mechanism that can improve the active learning approach, but also an explanation of the empirical successes of various existing approaches which ignore this bias. In particular, we show that this bias can be actively helpful when training overparameterized models -- like neural networks -- with relatively little data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression
