Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search
Nicola Dainese, Matteo Merler, Minttu Alakuijala, Pekka Marttinen

TL;DR
This paper introduces GIF-MCTS, a novel strategy combining Monte Carlo Tree Search with large language models to generate accurate, reliable, and efficient code-based world models for reinforcement learning, validated on a new benchmark.
Contribution
The paper presents GIF-MCTS, a new code generation method guided by MCTS, and introduces CWMB, a benchmark for evaluating code world models in RL tasks.
Findings
GIF-MCTS outperforms all baselines on CWMB and other benchmarks.
Code world models generated with GIF-MCTS improve RL sample efficiency.
The approach enables fast, interpretable, and reliable model-based RL agents.
Abstract
In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach in an offline RL setting, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Speech and dialogue systems
