TL;DR
WorldAct transforms static 3D world models into interactive, editable scenes by decomposing, reconstructing, and restoring objects and backgrounds, enabling richer interaction and manipulation.
Contribution
Introduces WorldAct, a framework that converts static 3D worlds into interaction-ready scenes with object-level editing and manipulation capabilities.
Findings
Enables object-level editing and collision-aware manipulation.
Supports embodied task execution within reconstructed scenes.
Preserves global scene coherence after modification.
Abstract
Recent 3D world modeling systems based on generative scene synthesis, such as Marble, can create coherent and explorable 3D environments, yet their outputs are typically static monolithic assets with limited editability and physical interaction. This restricts their use in immersive content creation and embodied simulation, where generated worlds must be actively modified and manipulated. To tackle this challenge, we present WorldAct, a framework that converts static generated 3D worlds into editable and interaction-ready scenes. WorldAct uses a multimodal agent to guide scene decomposition, identify actionable objects, reconstruct geometrically aligned object-level meshes for interaction, and restore the residual background via 3D inpainting. The resulting scenes support object-level editing, collision-aware manipulation, and embodied task execution while preserving global scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
