Simulating, Fast and Slow: Learning Policies for Black-Box Optimization
Fabio Valerio Massoli, Tim Bakker, Thomas Hehn, Tribhuvanesh Orekondy,, Arash Behboodi

TL;DR
This paper presents a new active learning approach that trains a policy to efficiently guide surrogate models for black-box optimization, significantly reducing the number of costly simulator calls.
Contribution
It introduces a novel active learning policy for training differentiable surrogates, enabling efficient gradient-based optimization of black-box simulators for related problems.
Findings
Up to 90% fewer simulator calls compared to baselines
Effective for solving multiple related black-box optimization problems
Surrogate-based approach accelerates optimization process
Abstract
In recent years, solving optimization problems involving black-box simulators has become a point of focus for the machine learning community due to their ubiquity in science and engineering. The simulators describe a forward process from simulation parameters and input data to observations , and the goal of the optimization problem is to find parameters that minimize a desired loss function. Sophisticated optimization algorithms typically require gradient information regarding the forward process, , with respect to the parameters . However, obtaining gradients from black-box simulators can often be prohibitively expensive or, in some cases, impossible. Furthermore, in many applications, practitioners aim to solve a set of related problems. Thus, starting the optimization ``ab initio", i.e. from…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The approach is straightforward and intuitively appealing. The presentation and structure and generally clear and informative (with a few exceptions noted below). The ability to efficiently estimate simulation parameters is clearly important to many fields, and using local information is bound to be a winning strategy on some problems.
The fact that a single "simulator call" means calling $f_\text{sim}$ thousands of times should be prominently stated early on, and not hidden in the appendix. The figures give a false impression that the algorithms used here can work with dozens of $f_\text{sim}$ evaluations, while in fact they appear to require thousands, similar to SBI methods that do not rely on local gradient estimates. The total number of simulation calls required to train the policy network should also be reported. If we s
* The experiment relating to the antenna placement is interesting and could be the focus of an applications paper if expanded on. * The motivation to work on black-box optimization and gradient-free approaches makes sense. * The introduction is written well. * It is good that the comparison is made with an ensemble of L-GSO models.
* The main weakness of the paper seems to be highlighted in the appendix, where algorithm 1 shows that the policy is required to be trained on simulated data prior to the actual black-box optimization in algorithm 2. However, this is not made clear in the main paper. This means that the results in the paper do not include the large number of simulations required to train a policy and seems to defeat the purpose of reducing the cost of simulation. Specifically, to train the policy, the simulator
- I am not an expert in the area, but the problem of amortizing an active learning strategy over multiple optimization is an interesting problem (although I am not sure how relevant this problem is in practice). - The method is tested on multiple real-world experiments, and is able to handle stochastic and non-stochastic simulators, suggesting a potential high impact for practitioners. - Aside from the issue related to amortization (discussed above), the presentation is very clear.
If I understood correctly, I see two ways in which the method could be beneficial: 1. either the total number of simulator calls over the 2 phases ((1) training the policy network and (2) optimizing the function) is smaller than the number of simulator calls in other methods. 2. or, as training the policy network can be amortized over multiple input distributions q_i, if one needs to solve multiple such optimization problems using different input distributions, the cost of training ends up bei
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Simulation Techniques and Applications
MethodsSparse Evolutionary Training · Focus
