READY: Reward Discovery for Meta-Black-Box Optimization
Zechuan Huang, Zhiguang Cao, Hongshu Guo, Yue-Jiao Gong, Zeyuan Ma

TL;DR
This paper introduces READY, an automated reward discovery method using Large Language Models to improve Meta-Black-Box Optimization by evolving effective reward functions through parallel, knowledge-sharing processes.
Contribution
The paper proposes a novel LLM-based framework for automatic reward discovery in MetaBBO, addressing bias and efficiency issues in reward design.
Findings
Discovered reward functions enhance MetaBBO performance.
Parallel multi-task evolution accelerates reward discovery.
Empirical results show improved optimization outcomes.
Abstract
Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
