TL;DR
PriorZero introduces a novel framework that combines Large Language Model priors with world-model-based planning, enhancing exploration and performance in complex decision-making tasks.
Contribution
It proposes a decoupled rollout-training approach that integrates LLM priors into MCTS and fine-tunes LLMs using interaction data, addressing prior-dynamics mismatch.
Findings
Improves exploration efficiency in text-based and gridworld tasks.
Enhances asymptotic performance across diverse benchmarks.
Demonstrates effective LLM-empowered decision-making.
Abstract
Leveraging the rich world knowledge of Large Language Models (LLMs) to enhance Reinforcement Learning (RL) agents offers a promising path toward general intelligence. However, a fundamental prior-dynamics mismatch hinders existing approaches: static LLM knowledge cannot directly adapt to the complex transition dynamics of long-horizon tasks. Using LLM priors as fixed policies limits exploration diversity, as the prior is blind to environment-specific dynamics; while end-to-end fine-tuning suffers from optimization instability and credit assignment issues. To bridge this gap, we propose PriorZero, a unified framework that integrates LLM-derived conceptual priors into world-model-based planning through a decoupled rollout-training design. During rollout, a novel root-prior injection mechanism incorporates LLM priors exclusively at the root node of Monte Carlo Tree Search (MCTS), focusing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
