Enhancing Q-Learning with Large Language Model Heuristics

Xiefeng Wu

arXiv:2405.03341·cs.LG·May 27, 2024

Enhancing Q-Learning with Large Language Model Heuristics

Xiefeng Wu

PDF

Open Access

TL;DR

This paper introduces LLM-guided Q-learning, a novel framework that uses large language models as heuristics to improve sample efficiency and robustness in reinforcement learning, addressing hallucinations and exploration issues.

Contribution

It presents a new method integrating LLMs as heuristics into Q-learning, with theoretical guarantees and practical benefits over existing reward shaping techniques.

Findings

01

Improves sample efficiency in reinforcement learning.

02

Prevents ineffective exploration and handles hallucinations.

03

Demonstrates robustness and generality across tasks.

Abstract

Q-learning excels in learning from feedback within sequential decision-making tasks but often requires extensive sampling to achieve significant improvements. While reward shaping can enhance learning efficiency, non-potential-based methods introduce biases that affect performance, and potential-based reward shaping, though unbiased, lacks the ability to provide heuristics for state-action pairs, limiting its effectiveness in complex environments. Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations. To address these challenges, we propose \textbf{LLM-guided Q-learning}, a framework that leverages LLMs as heuristics to aid in learning the Q-function for reinforcement learning. Our theoretical analysis demonstrates that this approach adapts to hallucinations, improves sample efficiency, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems