Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control
Maximilian H\"uttenrauch, Gerhard Neumann

TL;DR
This paper introduces an improved regret-aware black-box optimization method that combines natural gradients, trust regions, and entropy control, outperforming ranking-based methods especially in reinforcement learning tasks.
Contribution
The authors enhance the Model-based Relative Entropy Stochastic Search (MORE) algorithm by decoupling mean and covariance updates, introducing an entropy scheduling technique, and simplifying model learning, leading to faster convergence and better regret performance.
Findings
Outperforms ranking-based methods in regret on RL tasks.
Achieves competitive results on standard benchmark functions.
Faster convergence due to improved entropy scheduling.
Abstract
Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. Lastly, noisy fitness function evaluations may result in solutions that are highly sub-optimal on expectation. In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings. MORE can be derived by applying natural policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Metaheuristic Optimization Algorithms Research
