Online Preference-based Reinforcement Learning with Self-augmented   Feedback from Large Language Model

Songjun Tu; Jingbo Sun; Qichao Zhang; Xiangyuan Lan; Dongbin Zhao

arXiv:2412.16878·cs.LG·December 24, 2024

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

Songjun Tu, Jingbo Sun, Qichao Zhang, Xiangyuan Lan, Dongbin Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces RL-SaLLM-F, a novel method for online preference-based reinforcement learning that uses large language models to generate self-augmented feedback, eliminating the need for privileged reward information and improving feedback quality.

Contribution

The paper proposes a new LLM-based feedback mechanism for online PbRL that addresses query ambiguity and enhances feedback reliability without relying on privileged information.

Findings

01

Self-augmented LLM feedback outperforms scripted teacher feedback.

02

The double-check mechanism improves feedback reliability.

03

Method achieves competitive results on MetaWorld benchmarks.

Abstract

Preference-based reinforcement learning (PbRL) provides a powerful paradigm to avoid meticulous reward engineering by learning rewards based on human preferences. However, real-time human feedback is hard to obtain in online tasks. Most work suppose there is a "scripted teacher" that utilizes privileged predefined reward to provide preference feedback. In this paper, we propose a RL Self-augmented Large Language Model Feedback (RL-SaLLM-F) technique that does not rely on privileged information for online PbRL. RL-SaLLM-F leverages the reflective and discriminative capabilities of LLM to generate self-augmented trajectories and provide preference labels for reward learning. First, we identify an failure issue in LLM-based preference discrimination, specifically "query ambiguity", in online PbRL. Then LLM is employed to provide preference labels and generate self-augmented imagined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tu2021/rl-sallm-f
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining