Offline Model-Based Optimization via Policy-Guided Gradient Search
Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao, Doppa

TL;DR
This paper introduces a novel offline optimization method that uses policy-guided gradient search, reformulating the problem as offline reinforcement learning to improve optimization accuracy over traditional surrogate-based methods.
Contribution
It proposes a new learning-to-search framework for offline optimization, explicitly learning policies to guide gradient search based on offline data, addressing limitations of surrogate model overestimation.
Findings
Significantly improves optimization performance on multiple benchmarks.
Combines with existing offline surrogates for better results.
Addresses overestimation issues in surrogate-based offline optimization.
Abstract
Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics
MethodsSparse Evolutionary Training
