Reinforced In-Context Black-Box Optimization

Lei Song; Chenxiao Gao; Ke Xue; Chenyang Wu; Dong Li; Jianye Hao,; Zongzhang Zhang; Chao Qian

arXiv:2402.17423·cs.LG·November 4, 2024·1 cites

Reinforced In-Context Black-Box Optimization

Lei Song, Chenxiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao,, Zongzhang Zhang, Chao Qian

PDF

Open Access 1 Repo 3 Reviews

TL;DR

RIBBO is a reinforcement learning approach that trains a black-box optimization algorithm end-to-end using offline data, leveraging sequence models and regret-to-go tokens to adaptively generate query points across diverse tasks.

Contribution

The paper introduces RIBBO, a novel method that learns a BBO algorithm from offline data using sequence models and regret-to-go tokens, enabling flexible and automatic optimization.

Findings

01

Empirically outperforms existing BBO methods on benchmark functions.

02

Effectively adapts to hyper-parameter tuning and robot control tasks.

03

Demonstrates versatility across diverse optimization problems.

Abstract

Black-Box Optimization (BBO) has found successful applications in many fields of science and engineering. Recently, there has been a growing interest in meta-learning particular components of BBO algorithms to speed up optimization and get rid of tedious hand-crafted heuristics. As an extension, learning the entire algorithm from data requires the least labor from experts and can provide the most flexibility. In this paper, we propose RIBBO, a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion. RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks, leveraging the in-context learning ability of large models to extract task information and make decisions accordingly. Central to our method is to augment the optimization histories with \textit{regret-to-go} tokens, which are designed…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 8Confidence 4

Strengths

The paper is well-written and clear. The presentation is easy to follow and the claims and contributions are clearly stated. The method is novel and performs better than state-of-the-art approaches on the selected experiments.

Weaknesses

The main weakness of this method is the fact that it has to transfer or generalize to new optimization problems, while algorithmic optimizers have to be tuned to new optimization problems. It is hard to judge how much training is needed for this method to perform well on any "similar" problem.

Reviewer 02Rating 3Confidence 3

Strengths

The BBO problem is a very relevant problem. Given that BBO can be framed as a special case of RL (where the state is kept constant), applying a decision transformer (DT) (or something similar) to this problem seems like an interesting approach.

Weaknesses

I see some weaknesses both in the proposed method (and how it differs from DT), and the experimental evaluation. Methodology: * In my opinion, the manuscript would benefit from a more careful distinction from decision transformers, especially, since the BBO setting is a special case of the standard RL setting (where the state is constant). Why do the authors use regret-to-go tokens as opposed to returns-to-go in DT? In DT, no "observation tokens" are added to the sequence, and instead returns

Reviewer 03Rating 8Confidence 3

Strengths

- The paper is well-written and clear. - The literature is covered nicely in addition to motivating the proposed approach of learning the BBO algorithm in an E2E fashion. - The experiments and the baselines demonstrate the potential of the proposed approach. - The discussion section and ablation studies are comprehensive to understand the impact of each component in the algorithm.

Weaknesses

There are no significant weaknesses in this work, yet some important comments should be mentioned to improve this work. - The limitation of this approach is not well highlighted. - The quality of plots can be improved. I personally don't like the overlapping vertical lines that represent the standard deviation. A shaded region could be better.

Code & Models

Repositories

songlei00/ribbo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Optimization Algorithms

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings