Guiding Reinforcement Learning Using Uncertainty-Aware Large Language   Models

Maryam Shoaeinaeini; Brent Harrison

arXiv:2411.14457·cs.LG·November 25, 2024

Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models

Maryam Shoaeinaeini, Brent Harrison

PDF

Open Access

TL;DR

This paper introduces a calibrated guidance system using Monte Carlo Dropout and a dynamic entropy-based policy shaping method to improve the reliability of Large Language Models in guiding reinforcement learning, especially in sequential tasks.

Contribution

It presents a novel approach combining uncertainty calibration and adaptive guidance influence to enhance LLM-based RL training, addressing overconfidence issues.

Findings

01

Calibrated LLM guidance improves RL performance.

02

Average entropy effectively reflects guidance uncertainty.

03

Method outperforms uncalibrated LLMs and unguided RL in experiments.

Abstract

Human guidance in reinforcement learning (RL) is often impractical for large-scale applications due to high costs and time constraints. Large Language Models (LLMs) offer a promising alternative to mitigate RL sample inefficiency and potentially replace human trainers. However, applying LLMs as RL trainers is challenging due to their overconfidence and less reliable solutions in sequential tasks. We address this limitation by introducing a calibrated guidance system that uses Monte Carlo Dropout to enhance LLM advice reliability by assessing prediction variances from multiple forward passes. Additionally, we develop a novel RL policy shaping method based on dynamic model average entropy to adjust the LLM's influence on RL policies according to guidance uncertainty. This approach ensures robust RL training by relying on reliable LLM guidance. To validate our contributions, we conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling