Representative, Informative, and De-Amplifying: Requirements for Robust Bayesian Active Learning under Model Misspecification

Roubing Tang; Sabina J. Sloman; Samuel Kaski

arXiv:2506.07805·stat.ML·April 2, 2026

Representative, Informative, and De-Amplifying: Requirements for Robust Bayesian Active Learning under Model Misspecification

Roubing Tang, Sabina J. Sloman, Samuel Kaski

PDF

TL;DR

This paper analyzes the impact of model misspecification on Bayesian experimental design, introduces a new acquisition function to mitigate error amplification, and demonstrates improved performance over existing methods.

Contribution

It provides a mathematical analysis of generalization error under model misspecification and proposes R-IDeA, a novel acquisition function for robust Bayesian active learning.

Findings

01

The analysis reveals error (de-)amplification as a key factor in generalization error.

02

The R-IDeA acquisition function outperforms existing methods in experiments.

03

Including representativeness and de-amplification terms improves robustness.

Abstract

In many science and industry settings, a central challenge is designing experiments under time and budget constraints. Bayesian Optimal Experimental Design (BOED) is a paradigm to pick maximally informative designs that has been widely applied to such problems. During training, BOED selects inputs according to a pre-determined acquisition criterion to target informativeness. During testing, the model learned during training encounters a naturally occurring distribution of test samples. This leads to an instance of covariate shift, where the train and test samples are drawn from different distributions (the training samples are not representative of the test distribution). Prior work has shown that in the presence of model misspecification, covariate shift amplifies generalization error. Our first contribution is to provide a mathematical analysis of generalization error in the presence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.