Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
Yuwei Zeng, Yao Mu, Lin Shao

TL;DR
This paper introduces a novel method that leverages Large Language Models to learn reward functions for robot skills more efficiently by using a self-alignment process that minimizes ranking inconsistencies, validated across multiple tasks and environments.
Contribution
The paper proposes a new approach combining LLMs and self-alignment to improve reward learning efficiency without human supervision, reducing token usage and enhancing training effectiveness.
Findings
Improved training efficacy and efficiency across 9 tasks.
Significantly fewer GPT tokens used compared to mutation-based methods.
Consistent performance gains demonstrated in simulation environments.
Abstract
Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Discriminative Fine-Tuning · Multi-Head Attention · Layer Normalization · Dense Connections · Attention Dropout · Weight Decay · Cosine Annealing · Dropout
