Meta-Black-Box-Optimization through Offline Q-function Learning

Zeyuan Ma; Zhiguang Cao; Zhou Jiang; Hongshu Guo; Yue-Jiao Gong

arXiv:2505.02010·cs.NE·May 6, 2025

Meta-Black-Box-Optimization through Offline Q-function Learning

Zeyuan Ma, Zhiguang Cao, Zhou Jiang, Hongshu Guo, Yue-Jiao Gong

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Q-Mamba, an offline reinforcement learning framework for Meta-Black-Box-Optimization that improves efficiency and effectiveness by transforming the task into a long-sequence decision process and employing novel offline learning strategies.

Contribution

Q-Mamba is the first offline RL framework for MetaBBO, utilizing a new task formulation, Q-function decomposition, and architecture design to enhance offline learning efficiency and performance.

Findings

01

Q-Mamba achieves superior performance compared to prior methods.

02

It significantly improves training efficiency over existing online baselines.

03

Extensive benchmarks validate its effectiveness and efficiency.

Abstract

Recent progress in Meta-Black-Box-Optimization (MetaBBO) has demonstrated that using RL to learn a meta-level policy for dynamic algorithm configuration (DAC) over an optimization task distribution could significantly enhance the performance of the low-level BBO algorithm. However, the online learning paradigms in existing works makes the efficiency of MetaBBO problematic. To address this, we propose an offline learning-based MetaBBO framework in this paper, termed Q-Mamba, to attain both effectiveness and efficiency in MetaBBO. Specifically, we first transform DAC task into long-sequence decision process. This allows us further introduce an effective Q-function decomposition mechanism to reduce the learning difficulty within the intricate algorithm configuration space. Under this setting, we propose three novel designs to meta-learn DAC policy from offline data: we first propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

metaevo/q-mamba
pytorchOfficial

Videos

Meta-Black-Box-Optimization through Offline Q-function Learning· slideslive

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research

MethodsDynamic Algorithm Configuration · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Q-Learning