None To Optima in Few Shots: Bayesian Optimization with MDP Priors
Diantong Li, Kyunghyun Cho, Chong Liu

TL;DR
This paper introduces ProfBO, a Bayesian Optimization method that leverages MDP priors and meta-learning to efficiently optimize costly black-box functions with minimal evaluations, suitable for real-world applications.
Contribution
The paper presents ProfBO, a novel BO algorithm using MDP priors and meta-learning, enabling rapid adaptation and high-quality solutions with few evaluations.
Findings
ProfBO outperforms existing methods on real-world benchmarks.
Achieves high-quality solutions with significantly fewer evaluations.
Demonstrates practical applicability in drug discovery and hyperparameter tuning.
Abstract
Bayesian Optimization (BO) is an efficient tool for optimizing black-box functions, but its theoretical guarantees typically hold in the asymptotic regime. In many critical real-world applications such as drug discovery or materials design, where each evaluation can be very costly and time-consuming, BO becomes impractical for many evaluations. In this paper, we introduce the Procedure-inFormed BO (ProfBO) algorithm, which solves black-box optimization with remarkably few function evaluations. At the heart of our algorithmic design are Markov Decision Process (MDP) priors that model optimization trajectories from related source tasks, thereby capturing procedural knowledge on efficient optimization. We embed these MDP priors into a prior-fitted neural network and employ model-agnostic meta-learning for fast adaptation to new target tasks. Experiments on real-world Covid and Cancer…
Peer Reviews
Decision·Submitted to ICLR 2026
* Clear and motivated problem setting: BO in regimes where $T \le 20$ and evaluations are expensive, which is where standard asymptotic BO results are less useful. * Procedural transfer via MDP priors is novel in this particular PFN + MAML setup and allows the surrogate to internalize good search patterns rather than only response surfaces. * The PFN backbone gives a principled way to do single-pass Bayesian inference over contexts, leading to faster inference than GP posteriors and enabling the
* The approach depends critically on the availability and quality of source-task optimization trajectories; the paper does not quantify how performance degrades when source tasks are few, noisy, or mismatched with the target. * The MDP prior is learned with per-task DQN on (possibly) large discrete action spaces, and although the authors optimize it (subset of actions, batched GPU generation), this can still be expensive in domains without precollected meta-data. * The method is benchmarked most
* The main contribution of the paper is the MDP prior and the associated incorporation of optimization trajectories into a PFN framework. This is novel and quite interesting. The general strategy also seems like it could be useful outside the transfer learning / metalearning setting that is the focus of the paper. * The paper also introduces new problems that can be used for evaluating transfer learning methods that are based on real-world problems and seem that they will be useful for future w
* Feasibility of learning optimization trajectory policy on source tasks: The method includes learning a DQN policy network on the source tasks during fine tuning. As I understand it, this requires being able to make new evaluations of the source tasks, and in particular, not just using whatever optimization trajectory you happen to have from some earlier optimization on this task. Is that correct? The typical assumption in metalearning for BO is that the data you have from each source task come
* Tackles an important established problem setting highly relevant to the ICLR community * Usually PFNs are trained with synthetic data, incorporating actual evaluations is an interesting research direction and a good fit for the tackled problem setting * Combining MAML and PFN is an interesting methodological contribution * Follows best practices for reproducibility * Experimental setup, including baselines, protocol, and used datasets described in sufficient detail. * Sufficient discussion on
* The paper does not discuss limitations of their method and experimental design prominently * The ablation study is not convincing. The results of the ablation study as discussed in the text may not hold in the target range of 20 evaluations. Results are very noisy here and their un-ablated algorithm is not the preferred choice. The ablation study is not performed on the covid-b and cancer-b benchmarks. Why? * The approach does not take into account categorical parameters. How about integer par
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
