Learning Reward for Physical Skills using Large Language Model
Yuwei Zeng, Yiqing Xu

TL;DR
This paper introduces a method to leverage large language models to generate and iteratively refine reward functions for physical skill learning, addressing challenges of high-dimensionality and costly data collection.
Contribution
The paper presents a novel approach combining LLMs with environment feedback to create and optimize reward functions for physical skills, improving learning efficiency.
Findings
Effective reward functions generated for simulated physical tasks
Iterative self-alignment reduces ranking inconsistency
Method demonstrates improved learning support in simulations
Abstract
Learning reward functions for physical skills are challenging due to the vast spectrum of skills, the high-dimensionality of state and action space, and nuanced sensory feedback. The complexity of these tasks makes acquiring expert demonstration data both costly and time-consuming. Large Language Models (LLMs) contain valuable task-related knowledge that can aid in learning these reward functions. However, the direct application of LLMs for proposing reward functions has its limitations such as numerical instability and inability to incorporate the environment feedback. We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills. Our approach consists of two components. We first use the LLM to propose features and parameterization of the reward function. Next, we update the parameters of this proposed reward function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
