Meta-Black-Box-Optimization through Offline Q-function Learning
Zeyuan Ma, Zhiguang Cao, Zhou Jiang, Hongshu Guo, Yue-Jiao Gong

TL;DR
This paper introduces Q-Mamba, an offline reinforcement learning framework for Meta-Black-Box-Optimization that improves efficiency and effectiveness by transforming the task into a long-sequence decision process and employing novel offline learning strategies.
Contribution
Q-Mamba is the first offline RL framework for MetaBBO, utilizing a new task formulation, Q-function decomposition, and architecture design to enhance offline learning efficiency and performance.
Findings
Q-Mamba achieves superior performance compared to prior methods.
It significantly improves training efficiency over existing online baselines.
Extensive benchmarks validate its effectiveness and efficiency.
Abstract
Recent progress in Meta-Black-Box-Optimization (MetaBBO) has demonstrated that using RL to learn a meta-level policy for dynamic algorithm configuration (DAC) over an optimization task distribution could significantly enhance the performance of the low-level BBO algorithm. However, the online learning paradigms in existing works makes the efficiency of MetaBBO problematic. To address this, we propose an offline learning-based MetaBBO framework in this paper, termed Q-Mamba, to attain both effectiveness and efficiency in MetaBBO. Specifically, we first transform DAC task into long-sequence decision process. This allows us further introduce an effective Q-function decomposition mechanism to reduce the learning difficulty within the intricate algorithm configuration space. Under this setting, we propose three novel designs to meta-learn DAC policy from offline data: we first propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research
MethodsDynamic Algorithm Configuration · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Q-Learning
