VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

Haotian Mao; Yuhan Huang; Jiatao Lin; Yang Zhao,Hui Wang; Yiheng Zhang; Yuwang Wang; Chenliang Zhou; Yan Zhang; Fangcheng Zhong; Xubo Yang

arXiv:2605.17102·cs.GR·May 19, 2026

VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

Haotian Mao, Yuhan Huang, Jiatao Lin, Yang Zhao,Hui Wang, Yiheng Zhang, Yuwang Wang, Chenliang Zhou, Yan Zhang, Fangcheng Zhong, Xubo Yang

PDF

TL;DR

VoxScene introduces an object-centric voxel diffusion method for 3D indoor scene arrangement, ensuring collision-free, high-fidelity layouts with diverse shapes by explicitly modeling volumetric structures.

Contribution

The paper proposes a novel voxel diffusion framework that explicitly models volumetric structures for collision-free, realistic indoor scene synthesis, surpassing existing proxy-based methods.

Findings

01

Achieves state-of-the-art physical plausibility in scene layouts.

02

Ensures collision-free arrangements in complex environments.

03

Provides high-fidelity voxel grids for asset retrieval.

Abstract

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.