Leveraging LLMs for reward function design in reinforcement learning control tasks
Franklin Cardenoso, Wouter Caarls

TL;DR
This paper presents LEARN-Opt, an autonomous LLM-based framework that designs reward functions for reinforcement learning without prior metrics or source code, achieving competitive performance and reducing manual effort.
Contribution
LEARN-Opt introduces a fully autonomous, model-agnostic method for reward function generation that derives performance metrics directly from system descriptions, eliminating the need for human-engineered feedback.
Findings
LEARN-Opt performs comparably or better than state-of-the-art methods.
Automated reward design exhibits high variance, requiring multiple runs.
Low-cost LLMs can find high-quality reward functions similar to larger models.
Abstract
The challenge of designing effective reward functions in reinforcement learning (RL) represents a significant bottleneck, often requiring extensive human expertise and being time-consuming. Previous work and recent advancements in large language models (LLMs) have demonstrated their potential for automating the generation of reward functions. However, existing methodologies often require preliminary evaluation metrics, human-engineered feedback for the refinement process, or the use of environmental source code as context. To address these limitations, this paper introduces LEARN-Opt (LLM-based Evaluator and Analyzer for Reward functioN Optimization). This LLM-based, fully autonomous, and model-agnostic framework eliminates the need for preliminary metrics and environmental source code as context to generate, execute, and evaluate reward function candidates from textual descriptions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Machine Learning and Data Classification
