When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks
Stefano Ferraro, Akihiro Nakano, Masahiro Suzuki, Yutaka Matsuo

TL;DR
This paper explores the integration of object-centric world models with policy learning, demonstrating that while such models improve visual understanding and robustness, they face challenges in stable control due to latent drift during complex interactions.
Contribution
Introduces DLPWM, an unsupervised object-centric world model that learns from pixels and analyzes its limitations in policy stability during multi-object interactions.
Findings
DLPWM achieves strong reconstruction and robustness to visual variations.
Policies trained on DLPWM latents underperform compared to DreamerV3.
Latent drift during multi-object interactions causes policy instability.
Abstract
Object-centric world models (OCWM) aim to decompose visual scenes into object-level representations, providing structured abstractions that could improve compositional generalization and data efficiency in reinforcement learning. We hypothesize that explicitly disentangled object-level representations, by localizing task-relevant information, can enhance policy performance across novel feature combinations. To test this hypothesis, we introduce DLPWM, a fully unsupervised, disentangled object-centric world model that learns object-level latents directly from pixels. DLPWM achieves strong reconstruction and prediction performance, including robustness to several out-of-distribution (OOD) visual variations. However, when used for downstream model-based control, policies trained on DLPWM latents underperform compared to DreamerV3. Through latent-trajectory analyses, we identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
