Bayesian Batch Active Learning as Sparse Subset Approximation
Robert Pinsler, Jonathan Gordon, Eric Nalisnick, Jos\'e Miguel, Hern\'andez-Lobato

TL;DR
This paper introduces a Bayesian batch active learning method that efficiently selects diverse data points for labeling, improving scalability and performance in large-scale supervised learning tasks.
Contribution
It proposes a novel approach that approximates the complete data posterior to generate diverse batches, generalizes to arbitrary models with random projections, and demonstrates effectiveness on large-scale tasks.
Findings
Produces diverse batches for efficient active learning
Generalizes to arbitrary models using random projections
Shows improved performance on large-scale regression and classification tasks
Abstract
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Gaussian Processes and Bayesian Inference · Algorithms and Data Compression
