MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

Xiaowei Chi; Kuangzhi Ge; Jiaming Liu; Siyuan Zhou; Peidong Jia; Zichen He; Yuzhen Liu; Tingguang Li; Lei Han; Sirui Han; Shanghang Zhang; Yike Guo

arXiv:2506.18897·cs.RO·August 21, 2025

MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

Xiaowei Chi, Kuangzhi Ge, Jiaming Liu, Siyuan Zhou, Peidong Jia, Zichen He, Yuzhen Liu, Tingguang Li, Lei Han, Sirui Han, Shanghang Zhang, Yike Guo

PDF

Open Access

TL;DR

MinD introduces a dual diffusion system for real-time, risk-aware robotic planning, efficiently predicting future states and identifying potential failures to enhance manipulation reliability.

Contribution

It presents a novel dual diffusion model with a co-training strategy for real-time, risk-aware robotic planning and failure prediction, improving efficiency and safety.

Findings

01

Achieves 63% success on RL-Bench tasks

02

Operates at 11.3 FPS for real-time control

03

Identifies 74% of potential failures in advance

Abstract

Video Generation Models (VGMs) have become powerful backbones for Vision-Language-Action (VLA) models, leveraging large-scale pretraining for robust dynamics modeling. However, current methods underutilize their distribution modeling capabilities for predicting future states. Two challenges hinder progress: integrating generative processes into feature learning is both technically and conceptually underdeveloped, and naive frame-by-frame video diffusion is computationally inefficient for real-time robotics. To address these, we propose Manipulate in Dream (MinD), a dual-system world model for real-time, risk-aware planning. MinD uses two asynchronous diffusion processes: a low-frequency visual generator (LoDiff) that predicts future scenes and a high-frequency diffusion policy (HiDiff) that outputs actions. Our key insight is that robotic policies do not require fully denoised frames…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning

MethodsDiffusion