Learning Reward for Robot Skills Using Large Language Models via   Self-Alignment

Yuwei Zeng; Yao Mu; Lin Shao

arXiv:2405.07162·cs.RO·May 17, 2024·2 cites

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

Yuwei Zeng, Yao Mu, Lin Shao

PDF

Open Access

TL;DR

This paper introduces a novel method that leverages Large Language Models to learn reward functions for robot skills more efficiently by using a self-alignment process that minimizes ranking inconsistencies, validated across multiple tasks and environments.

Contribution

The paper proposes a new approach combining LLMs and self-alignment to improve reward learning efficiency without human supervision, reducing token usage and enhancing training effectiveness.

Findings

01

Improved training efficacy and efficiency across 9 tasks.

02

Significantly fewer GPT tokens used compared to mutation-based methods.

03

Consistent performance gains demonstrated in simulation environments.

Abstract

Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Discriminative Fine-Tuning · Multi-Head Attention · Layer Normalization · Dense Connections · Attention Dropout · Weight Decay · Cosine Annealing · Dropout