EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Yue Liao, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren

TL;DR
EnerVerse is a novel generative foundation model for robotics that constructs and interprets embodied spaces, enabling efficient prediction and physical action execution in complex 3D environments, bridging simulation and real-world applications.
Contribution
Introduces EnerVerse, a comprehensive framework combining generative modeling, multi-view video representation, and a data engine to advance robotics manipulation and reduce sim-to-real gap.
Findings
Achieves state-of-the-art performance in simulation and real-world tasks.
Reaches approximately 280 ms per 8-step action chunk on a single GPU.
Effectively models 3D robotics environments with rich multi-view data.
Abstract
We introduce EnerVerse, a generative robotics foundation model that constructs and interprets embodied spaces. EnerVerse employs a chunk-wise autoregressive video diffusion framework to predict future embodied spaces from instructions, enhanced by a sparse context memory for long-term reasoning. To model the 3D robotics world, we adopt a multi-view video representation, providing rich perspectives to address challenges like motion ambiguity and 3D grounding. Additionally, EnerVerse-D, a data engine pipeline combining generative modeling with 4D Gaussian Splatting, forms a self-reinforcing data loop to reduce the sim-to-real gap. Leveraging these innovations, EnerVerse translates 4D world representations into physical actions via a policy head (EnerVerse-A), achieving state-of-the-art performance in both simulation and real-world tasks. For efficiency, EnerVerse-A reuses features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArchitecture and Computational Design · Robotics and Automated Systems · Ethics and Social Impacts of AI
MethodsDiffusion
