UniZero: Generalized and Efficient Planning with Scalable Latent World   Models

Yuan Pu; Yazhe Niu; Zhenjie Yang; Jiyuan Ren; Hongsheng Li; and Yu Liu

arXiv:2406.10667·cs.LG·January 6, 2025

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Yuan Pu, Yazhe Niu, Zhenjie Yang, Jiyuan Ren, Hongsheng Li, and Yu Liu

PDF

Open Access 1 Repo

TL;DR

UniZero introduces a scalable, modular transformer-based world model for reinforcement learning that improves long-term planning and multitask learning, outperforming existing methods in diverse benchmarks.

Contribution

It presents UniZero, a novel modular transformer-based world model that learns a shared latent space for efficient planning and scalability in heterogeneous RL scenarios.

Findings

01

Outperforms baselines in long-term memory benchmarks

02

Demonstrates superior scalability in multitask Atari learning

03

Matches or surpasses state-of-the-art in single-task RL

Abstract

Learning predictive world models is crucial for enhancing the planning capabilities of reinforcement learning (RL) agents. Recently, MuZero-style algorithms, leveraging the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, these methods struggle to scale in heterogeneous scenarios with diverse dependencies and task variability. To overcome these limitations, we introduce UniZero, a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in the latent space. We show that UniZero significantly outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opendilab/LightZero
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Robotic Path Planning Algorithms · Machine Learning and Algorithms