A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang,, Xiu Li

TL;DR
This paper introduces CARD, a framework that uses large language models to iteratively generate and refine reward functions for reinforcement learning, reducing human effort and improving task performance.
Contribution
The paper presents a novel LLM-driven reward design framework with dynamic feedback mechanisms, including trajectory preference evaluation, to automate and enhance reward function creation.
Findings
Outperforms baselines on Meta-World and ManiSkill2 tasks.
Achieves better or comparable performance to expert-designed rewards on most tasks.
Surpasses the oracle reward on 3 tasks, demonstrating effectiveness.
Abstract
Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or repetitive RL training. To address these issues, we propose CARD, a LLM-driven Reward Design framework that iteratively generates and improves reward function code. Specifically, CARD includes a Coder that generates and verifies the code, while a Evaluator provides dynamic feedback to guide the Coder in improving the code, eliminating the need for human feedback. In addition to process feedback and trajectory feedback, we introduce Trajectory Preference Evaluation (TPE), which evaluates the current reward function based on trajectory preferences. If the code fails the TPE, the Evaluator provides preference feedback, avoiding RL training at every iteration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗dogtooth/open-lm-1b-201305model· 3 dl3 dl
- 🤗dogtooth/open-lm-1b-201501model· 3 dl3 dl
- 🤗dogtooth/open-lm-1b-201701model· 4 dl4 dl
- 🤗dogtooth/open-lm-1b-201901model· 3 dl3 dl
- 🤗dogtooth/open-lm-1b-202101model· 2 dl2 dl
- 🤗dogtooth/open-lm-1b-202301model· 3 dl3 dl
- 🤗dogtooth/open-lm-1b-202407model· 3 dl3 dl
- 🤗dogtooth/open-lm-3b-201305model· 11 dl11 dl
- 🤗dogtooth/open-lm-3b-201501model· 8 dl8 dl
- 🤗dogtooth/open-lm-3b-201701model· 7 dl7 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
