WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

Jichen Hu; Jiawei Guo; Jiazhong Cen; Chen Yang; Sikuang Li; Wei Shen

arXiv:2605.15843·cs.CV·May 18, 2026

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

Jichen Hu, Jiawei Guo, Jiazhong Cen, Chen Yang, Sikuang Li, Wei Shen

PDF

1 Repo

TL;DR

WorldAct transforms static 3D world models into interactive, editable scenes by decomposing, reconstructing, and restoring objects and backgrounds, enabling richer interaction and manipulation.

Contribution

Introduces WorldAct, a framework that converts static 3D worlds into interaction-ready scenes with object-level editing and manipulation capabilities.

Findings

01

Enables object-level editing and collision-aware manipulation.

02

Supports embodied task execution within reconstructed scenes.

03

Preserves global scene coherence after modification.

Abstract

Recent 3D world modeling systems based on generative scene synthesis, such as Marble, can create coherent and explorable 3D environments, yet their outputs are typically static monolithic assets with limited editability and physical interaction. This restricts their use in immersive content creation and embodied simulation, where generated worlds must be actively modified and manipulated. To tackle this challenge, we present WorldAct, a framework that converts static generated 3D worlds into editable and interaction-ready scenes. WorldAct uses a multimodal agent to guide scene decomposition, identify actionable objects, reconstruct geometrically aligned object-level meshes for interaction, and restore the residual background via 3D inpainting. The resulting scenes support object-level editing, collision-aware manipulation, and embodied task execution while preserving global scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjtu-deepvisionlab/WorldAct
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.