ERFSL: An Efficient Reward Function Searcher via Language Models for Custom-Environment Multi-Objective Optimization (Student Abstract)
Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Yimian Ding, Shuai Zhang

TL;DR
ERFSL leverages large language models to efficiently generate and optimize reward functions for multi-objective learning in custom environments, requiring minimal feedback iterations.
Contribution
This work introduces ERFSL, a novel LLM-based method for reward function search and optimization tailored for multi-objective tasks.
Findings
Reward critic corrects reward codes with one feedback iteration.
Diverse reward functions are acquired within the Pareto set.
Only 5.2 iterations needed to meet user requirements even with large weight deviations.
Abstract
We propose ERFSL, an efficient reward function searcher using large language models (LLMs) for custom-environment, multi-objective learning-based methods (LB). ERFSL generates reward components based on explicit user requirements, rectifies them using a reward critic, and iteratively optimizes the weights of these components based on textual context generated by the training log analyzer. Applied to a simulation-based benchmark task, the reward critic corrects reward codes with only one feedback iteration per requirement, and the reward weight initializer acquires diverse reward functions within the Pareto set. Even when a weight is off by a factor of 500, an average of only 5.2 iterations is needed to meet user requirements. The approach works adequately with GPT-4o mini and does not require advanced understanding capabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
