MANSION: Multi-floor lANguage-to-3D Scene generatIOn for loNg-horizon tasks
Lirong Che, Shuo Wen, Shan Huang, Chuang Wang, Yuzhe Yang, Gregory Dudek, Xueqian Wang, Jian Su

TL;DR
MANSION introduces a novel framework for generating multi-floor 3D environments driven by language, enabling realistic, complex, and navigable building-scale tasks for advancing spatial reasoning in robotics.
Contribution
We present MANSION, the first language-driven system for creating multi-floor 3D environments, along with MansionWorld dataset and a scene editing agent for long-horizon tasks.
Findings
State-of-the-art agents perform poorly on MANSION environments.
MANSION provides diverse, realistic multi-floor scenes for benchmarking.
The framework supports customization of environments via open-vocabulary commands.
Abstract
Real-world robotic tasks are long-horizon and often span multiple floors, demanding rich spatial reasoning. However, existing embodied benchmarks are largely confined to single-floor in-house environments, failing to reflect the complexity of real-world tasks. We introduce MANSION, the first language-driven framework for generating building-scale, multi-floor 3D environments. Being aware of vertical structural constraints, MANSION generates realistic, navigable whole-building structures with diverse, human-friendly scenes, enabling the development and evaluation of cross-floor long-horizon tasks. Building on this framework, we release MansionWorld, a dataset of over 1,000 diverse buildings ranging from hospitals to offices, alongside a Task-Semantic Scene Editing Agent that customizes these environments using open-vocabulary commands to meet specific user needs. Benchmarking reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
