Multi-objective Hyperparameter Optimization in the Age of Deep Learning
Soham Basu, Frank Hutter, Danny Stoll

TL;DR
This paper introduces PriMO, a novel multi-objective hyperparameter optimization algorithm that incorporates prior knowledge and beliefs, achieving state-of-the-art results in deep learning benchmarks.
Contribution
PriMO is the first HPO algorithm to integrate multi-objective user beliefs, enhancing optimization performance in deep learning tasks.
Findings
Achieves state-of-the-art performance on 8 DL benchmarks.
Effectively incorporates prior knowledge and multi-objective beliefs.
Outperforms existing HPO algorithms in both single and multi-objective settings.
Abstract
While Deep Learning (DL) experts often have prior knowledge about which hyperparameter settings yield strong performance, only few Hyperparameter Optimization (HPO) algorithms can leverage such prior knowledge and none incorporate priors over multiple objectives. As DL practitioners often need to optimize not just one but many objectives, this is a blind spot in the algorithmic landscape of HPO. To address this shortcoming, we introduce PriMO, the first HPO algorithm that can integrate multi-objective user beliefs. We show PriMO achieves state-of-the-art performance across 8 DL benchmarks in the multi-objective and single-objective setting, clearly positioning itself as the new go-to HPO algorithm for DL practitioners.
Peer Reviews
Decision·Submitted to ICLR 2026
The paper is very well written: definitions, algorithms, and experiments are presented cleanly and logically, making the work easy to follow. Multi-objective HPO is an important and practical topic for modern deep-learning workflows; addressing the lack of prior-aware solutions fills a real methodological gap. The proposed combination of prior-weighted acquisition with ε-greedy scheduling and a multi-fidelity initialization is conceptually coherent and empirically justified. The evaluation cover
1. Beyond the Pareto-front visualizations, the paper could include more case-level examples or qualitative comparisons to help readers connect the optimization behavior with real task utility and model performance trade-offs. 2. In Algorithm 2 (the BO step), the parameter η is listed but seems unused—clarifying whether it affects fidelity scheduling or is inherited from the initialization stage would improve completeness. 3. A brief theoretical or intuitive discussion about how PriMO behaves w
1. The experimental section is definitive. The authors benchmark PriMO against a wide spectrum of baselines, ranging from classical multi-objective evolutionary algorithms to multi-fidelity optimizers (MOASHA, Hyperband) to Bayesian approaches. They also construct custom baselines (e.g., MOASHA+Prior, πBO+RW) to isolate the benefits of priors in the multi-objective context. PriMO consistently outperforms across eight deep learning benchmarks (image classification, translation, and language model
1. While the authors' acquisition function in Equation (4) works well empirically, the paper lacks a theoretical analysis of its properties. Ideal results would describe under what conditions we get convergence to the true Pareto frontier, or how the exploration parameter interacts with uncertainty estimation in BO. It is hard to know what the *secret sauce* of this choice is. I think this work would be stronger if there were some clear and simple example to have in mind that demonstrates the is
- Utilizing prior knowledge in multi-objective HPO is a good, under-explored topic. - Results exhibit good performance, whether the priors are good or bad.
- The title is too exaggerated in my eyes. HPO for deep learning faces numerous challenges, while the topic in this paper is only a very small one. Besides, it is not clear how the work addresses specific issues for deep learning. - In practice, prior knowledge should be scarce and diverse. There is a lack of assumptions about the priors that this paper considers. - The paper claims that priors can be good or bad. I wonder if it is a rigorous problem definition. How can you differentiate which
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research
