Dynamic Priors in Bayesian Optimization for Hyperparameter Optimization
Lukas Fehring, Marcel Wever, Maximilian Splieth\"over, Leona Hennig, Henning Wachsmuth, Marius Lindauer

TL;DR
DynaBO is a Bayesian optimization framework that allows continuous user control and incorporation of priors during hyperparameter tuning, improving efficiency and robustness in iterative machine learning workflows.
Contribution
We introduce DynaBO, a novel BO method that integrates user priors dynamically with theoretical guarantees and robustness features, advancing hyperparameter optimization techniques.
Findings
DynaBO outperforms state-of-the-art methods across various benchmarks.
It effectively incorporates and adapts to different user priors.
The framework maintains convergence guarantees even with misleading priors.
Abstract
Bayesian optimization (BO) is a widely used approach to hyperparameter optimization (HPO). However, most existing HPO methods only incorporate expert knowledge during initialization, limiting practitioners' ability to influence the optimization process as new insights emerge. This limits the applicability of BO in iterative machine learning development workflows. We propose DynaBO, a BO framework that enables continuous user control of the optimization process. Over time, DynaBO leverages provided user priors by augmenting the acquisition function with decaying, prior-weighted preferences while preserving asymptotic convergence guarantees. To reinforce robustness, we introduce a data-driven safeguard that detects and can be used to reject misleading priors. We prove theoretical results on near-certain convergence, robustness to adversarial priors, and accelerated convergence when…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper is clearly written. - The idea is novel, extending a previous work to incorporate dynamic priors. - There are many analyses to support the work, including theoretical analysis and experimental results. Regarding the experiments, there are many different benchmark settings, including different prior settings (from informative to misleading priors), different prior supplying times (fixed and random times) and different surrogate models (Gaussian Process and Random Forest).
1. The choice of square exponent in the formula in line 206 is not clearly explained. The authors mention it is to avoid the slow fading of old priors, but it still does not explain why square (rather than other functions such as cubic, etc.) should be used. There should be another ablation study on this choice, because I think the fading speed of older priors is an important factor to consider. 2. It is not clear why some baselines are not compared in the experiments, such as PriorBand (Mallik
- Putting the human in the loop with HPO is an interesting (but not new) problem. - The experimental study is carried out on realistic HPO tuning tasks (albeit I have concerns about the choice of priors as below).
- The theoretical analyses provided are only asymptotic in nature; this is especially misleading for Theorem 2, where for finite-time, clearly misspecified priors will slow down performance. The analysis essentially relies on the fact that asymptotically the effect of the prior vanishes / becomes trivial. - The theoretical analysis seems to have some issues, and the exposition contains some vague / unjustified statements (see questions below) - While the objective functions in the experiment co
- The paper is generally well written and easy to follow. - The empirical evaluation appears thorough, comparing different priors across various benchmarks. The chosen baselines are reasonable, though the evaluation could potentially be enriched by including additional methods from the literature, for example Seng et al. - The paper includes an ablation study on the sensitivity of the hyperparameter tau, which seems to play a central role in the proposed method.
Significance: While the motivation to incorporate user-provided priors into the optimization process is clear and conceptually appealing, I have some concerns regarding its practical applicability. In many real-world scenarios, I would argue that users might find it more intuitive to provide such priors only at the beginning of the optimization rather than continuously throughout the process. Especially in an AutoML context, where the goal is typically to maximize automation. Moreover, the curr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research
