Beyond Human Preferences: Exploring Reinforcement Learning Trajectory   Evaluation and Improvement through LLMs

Zichao Shen; Tianchen Zhu; Qingyun Sun; Shiqi Gao; Jianxin Li

arXiv:2406.19644·cs.AI·July 2, 2024·1 cites

Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

Zichao Shen, Tianchen Zhu, Qingyun Sun, Shiqi Gao, Jianxin Li

PDF

Open Access

TL;DR

This paper introduces LLM4PG, a framework that uses large language models to automatically generate preferences and improve reinforcement learning in complex game environments, reducing reliance on human input.

Contribution

The paper presents a novel LLM-enabled method for automatic preference generation that enhances RL performance in complex, constraint-rich environments.

Findings

01

LLM4PG accelerates RL convergence in complex tasks.

02

It overcomes stagnation caused by poor reward signals.

03

Reduces dependence on human preference data.

Abstract

Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints. Preference-based reinforcement learning (PbRL) presents a pioneering framework that capitalizes on human preferences as pivotal reward signals, thereby circumventing the need for meticulous reward engineering. However, obtaining preference data from human experts is costly and inefficient, especially under conditions marked by complex constraints. To tackle this challenge, we propose a LLM-enabled automatic preference generation framework named LLM4PG , which harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Safety Systems Engineering in Autonomy · Traffic control and management