UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu, Yazhe Niu, Zhenjie Yang, Jiyuan Ren, Hongsheng Li, and Yu Liu

TL;DR
UniZero introduces a scalable, modular transformer-based world model for reinforcement learning that improves long-term planning and multitask learning, outperforming existing methods in diverse benchmarks.
Contribution
It presents UniZero, a novel modular transformer-based world model that learns a shared latent space for efficient planning and scalability in heterogeneous RL scenarios.
Findings
Outperforms baselines in long-term memory benchmarks
Demonstrates superior scalability in multitask Atari learning
Matches or surpasses state-of-the-art in single-task RL
Abstract
Learning predictive world models is crucial for enhancing the planning capabilities of reinforcement learning (RL) agents. Recently, MuZero-style algorithms, leveraging the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, these methods struggle to scale in heterogeneous scenarios with diverse dependencies and task variability. To overcome these limitations, we introduce UniZero, a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in the latent space. We show that UniZero significantly outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Robotic Path Planning Algorithms · Machine Learning and Algorithms
