Enhancing Q-Learning with Large Language Model Heuristics
Xiefeng Wu

TL;DR
This paper introduces LLM-guided Q-learning, a novel framework that uses large language models as heuristics to improve sample efficiency and robustness in reinforcement learning, addressing hallucinations and exploration issues.
Contribution
It presents a new method integrating LLMs as heuristics into Q-learning, with theoretical guarantees and practical benefits over existing reward shaping techniques.
Findings
Improves sample efficiency in reinforcement learning.
Prevents ineffective exploration and handles hallucinations.
Demonstrates robustness and generality across tasks.
Abstract
Q-learning excels in learning from feedback within sequential decision-making tasks but often requires extensive sampling to achieve significant improvements. While reward shaping can enhance learning efficiency, non-potential-based methods introduce biases that affect performance, and potential-based reward shaping, though unbiased, lacks the ability to provide heuristics for state-action pairs, limiting its effectiveness in complex environments. Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations. To address these challenges, we propose \textbf{LLM-guided Q-learning}, a framework that leverages LLMs as heuristics to aid in learning the Q-function for reinforcement learning. Our theoretical analysis demonstrates that this approach adapts to hallucinations, improves sample efficiency, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
