READY: Reward Discovery for Meta-Black-Box Optimization

Zechuan Huang; Zhiguang Cao; Hongshu Guo; Yue-Jiao Gong; Zeyuan Ma

arXiv:2601.21847·cs.LG·January 30, 2026

READY: Reward Discovery for Meta-Black-Box Optimization

Zechuan Huang, Zhiguang Cao, Hongshu Guo, Yue-Jiao Gong, Zeyuan Ma

PDF

Open Access

TL;DR

This paper introduces READY, an automated reward discovery method using Large Language Models to improve Meta-Black-Box Optimization by evolving effective reward functions through parallel, knowledge-sharing processes.

Contribution

The paper proposes a novel LLM-based framework for automatic reward discovery in MetaBBO, addressing bias and efficiency issues in reward design.

Findings

01

Discovered reward functions enhance MetaBBO performance.

02

Parallel multi-task evolution accelerates reward discovery.

03

Empirical results show improved optimization outcomes.

Abstract

Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research