None To Optima in Few Shots: Bayesian Optimization with MDP Priors

Diantong Li; Kyunghyun Cho; Chong Liu

arXiv:2511.01006·cs.LG·November 4, 2025

None To Optima in Few Shots: Bayesian Optimization with MDP Priors

Diantong Li, Kyunghyun Cho, Chong Liu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces ProfBO, a Bayesian Optimization method that leverages MDP priors and meta-learning to efficiently optimize costly black-box functions with minimal evaluations, suitable for real-world applications.

Contribution

The paper presents ProfBO, a novel BO algorithm using MDP priors and meta-learning, enabling rapid adaptation and high-quality solutions with few evaluations.

Findings

01

ProfBO outperforms existing methods on real-world benchmarks.

02

Achieves high-quality solutions with significantly fewer evaluations.

03

Demonstrates practical applicability in drug discovery and hyperparameter tuning.

Abstract

Bayesian Optimization (BO) is an efficient tool for optimizing black-box functions, but its theoretical guarantees typically hold in the asymptotic regime. In many critical real-world applications such as drug discovery or materials design, where each evaluation can be very costly and time-consuming, BO becomes impractical for many evaluations. In this paper, we introduce the Procedure-inFormed BO (ProfBO) algorithm, which solves black-box optimization with remarkably few function evaluations. At the heart of our algorithmic design are Markov Decision Process (MDP) priors that model optimization trajectories from related source tasks, thereby capturing procedural knowledge on efficient optimization. We embed these MDP priors into a prior-fitted neural network and employ model-agnostic meta-learning for fast adaptation to new target tasks. Experiments on real-world Covid and Cancer…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

* Clear and motivated problem setting: BO in regimes where $T \le 20$ and evaluations are expensive, which is where standard asymptotic BO results are less useful. * Procedural transfer via MDP priors is novel in this particular PFN + MAML setup and allows the surrogate to internalize good search patterns rather than only response surfaces. * The PFN backbone gives a principled way to do single-pass Bayesian inference over contexts, leading to faster inference than GP posteriors and enabling the

Weaknesses

* The approach depends critically on the availability and quality of source-task optimization trajectories; the paper does not quantify how performance degrades when source tasks are few, noisy, or mismatched with the target. * The MDP prior is learned with per-task DQN on (possibly) large discrete action spaces, and although the authors optimize it (subset of actions, batched GPU generation), this can still be expensive in domains without precollected meta-data. * The method is benchmarked most

Reviewer 02Rating 2Confidence 3

Strengths

* The main contribution of the paper is the MDP prior and the associated incorporation of optimization trajectories into a PFN framework. This is novel and quite interesting. The general strategy also seems like it could be useful outside the transfer learning / metalearning setting that is the focus of the paper. * The paper also introduces new problems that can be used for evaluating transfer learning methods that are based on real-world problems and seem that they will be useful for future w

Weaknesses

* Feasibility of learning optimization trajectory policy on source tasks: The method includes learning a DQN policy network on the source tasks during fine tuning. As I understand it, this requires being able to make new evaluations of the source tasks, and in particular, not just using whatever optimization trajectory you happen to have from some earlier optimization on this task. Is that correct? The typical assumption in metalearning for BO is that the data you have from each source task come

Reviewer 03Rating 2Confidence 3

Strengths

* Tackles an important established problem setting highly relevant to the ICLR community * Usually PFNs are trained with synthetic data, incorporating actual evaluations is an interesting research direction and a good fit for the tackled problem setting * Combining MAML and PFN is an interesting methodological contribution * Follows best practices for reproducibility * Experimental setup, including baselines, protocol, and used datasets described in sufficient detail. * Sufficient discussion on

Weaknesses

* The paper does not discuss limitations of their method and experimental design prominently * The ablation study is not convincing. The results of the ablation study as discussed in the text may not hold in the target range of 20 evaluations. Results are very noisy here and their un-ablated algorithm is not the preferred choice. The ablation study is not performed on the covid-b and cancer-b benchmarks. Why? * The approach does not take into account categorical parameters. How about integer par

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference