Efficient Bayesian Updates for Deep Active Learning via Laplace Approximations

Denis Huseljic; Marek Herde; Lukas Rauch; Paul Hahn; Zhixin Huang; Daniel Kottke; Stephan Vogt; Bernhard Sick

arXiv:2210.06112·cs.LG·March 12, 2026·1 cites

Efficient Bayesian Updates for Deep Active Learning via Laplace Approximations

Denis Huseljic, Marek Herde, Lukas Rauch, Paul Hahn, Zhixin Huang, Daniel Kottke, Stephan Vogt, Bernhard Sick

PDF

Open Access 1 Repo 5 Reviews

TL;DR

This paper introduces an efficient Bayesian update method using Laplace approximations to improve deep active learning, reducing computational costs while maintaining performance, and enabling sequential batch selection strategies.

Contribution

It presents a novel second-order Bayesian update for deep neural networks that approximates retraining efficiently, facilitating better batch selection in active learning.

Findings

01

The proposed update closely matches retraining performance.

02

It significantly reduces computational complexity.

03

Enables sequential and look-ahead batch selection strategies.

Abstract

Deep active learning (AL) selects batches of instances for annotation to avoid retraining deep neural networks (DNNs) after each new label. Employing a naive top- $b$ selection can result in a batch of redundant (similar) instances. To address this, various AL strategies employ clustering techniques that ensure diversity within a batch. We approach this issue by substituting the costly retraining with an efficient Bayesian update. Our proposed update represents a second-order optimization step using the Gaussian posterior from a last-layer Laplace approximation. Thereby, we achieve low computational complexity by computing the inverse Hessian in closed form. We demonstrate that in typical AL settings, our update closely approximates retraining while being considerably faster. Leveraging our update, we introduce a new framework for batch selection through sequential construction, updating…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 3

Strengths

The proposed method is interesting as it can be potentially used for turning any sequential AL strategy into a batch version. Especially that the proposed method is efficient and computationally inexpensive. Experimental results on different datasets demonstrate that the effectiveness of the proposed method.

Weaknesses

There are some concerns about the soundness of the proposed method. Details follow. * The proposed method establishes on the assumption that the original dataset $\\mathcal{D}$ and the newly acquired dataset $\\mathcal{D}^{\\oplus}$ are i.i.d., but this is false for actively acquired dataset. * In section 4, it seems that there is a obvious baseline that is missing, i.e., fine-tuning the last layer on the newly-obtained dataset, the entire dataset, or a mixture of both. How would they compare

Reviewer 02Rating 3Confidence 5

Strengths

The paper studies an important problem in active learning and the experiments demonstrates the method is effective compared to various active learning baseline algorithms. The use of Bayesian approximation is also new in the context of active learning. The experiments are conducted for both text and image datasets with various transformer architectures.

Weaknesses

I have a couple major concerns for this paper: 1. There are many existing ways of efficiently approximating the model performance and reduce retraining cost in deep active learning. For example, Selection via proxy [1] shows using a smaller proxy model can be an effective alternative. Along that line, LabelBench [2] shows in one can simply retrain the last layer during selection to obtain the same performance in the end. In another line of research [3-5], the authors leverage NTK approximations

Reviewer 03Rating 3Confidence 5

Strengths

The approach is based on a Gaussian approximation to the last layer of the network, and efficient approximations to update the distribution based on new labeled data. The idea of using the final layer for data selection in active learning is central to other schemes such as Badge and LabelBench (see reference for the latter below). However, this particular Bayesian approach to deep active learning appears to be a bit different from past approaches. The paper has two main components: 1) deriv

Weaknesses

Most of the paper is dedicated to the derivation the last layer Laplace approximation. The approach is straightforward, and I do not feel this is a significant contribution to the field. It is based on standard techniques and approximations. There is nothing particularly novel or insightful about the approach. The active learning approach is a standard margin-based procedure. The novelty is that it uses the last-layer Laplace approximation rather than the actual model and margin. This mean

Reviewer 04Rating 5Confidence 4

Strengths

This paper aims to tackle an important problem by efficiently updating the newly queried data in active learning. It’s well known that retraining from scratch is a bad idea since it unnecessarily consumes computational resources, while a naive update may lead to catastrophic forgetting problems. Based on these viewpoints, I believe this paper addresses an important issue in active learning. The paper is overall clearly written, with diverse experimental settings.

Weaknesses

I have two major concerns regarding the paper. Therefore, I do not think the current paper is ready for publication. - The first concern is a lack of discussion and comparison with a very related paper [1]. I believe these two issues in active learning have been clearly discussed, and different continual learning methods have been tested. When comparing paper [1] with the current submission, I would say paper [1] seems more comprehensive and useful in terms of practice, while the current submis

Reviewer 05Rating 6Confidence 3

Strengths

Interesting perspective on improving computational efficiency on AL. This method does not need to recompute the Hessian from scratch. Instead, their updates use covariances through Laplace Approximations and the Woodbury identity for closed-form inversion. This is a very interesting work combining theoretical groundings and experimental evidence.

Weaknesses

1. Lack of related work. For look ahead approaches, as the authors illustrate in Section 5.2, "The idea of look-ahead strategies is to select instances that, once labeled and added to the labeled pool, maximize the performance of the model", recently there are many data-driven approaches, which trains neural networks to predict the performance of labeled pool after adding certain instances [1] or uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dy

Code & Models

Repositories

dhuseljic/dal-toolbox
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Target Tracking and Data Fusion in Sensor Networks

MethodsGaussian Process · Dropout