PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization

Dongli Wu; Jingyu Hu; Ka-Hei Hui; Xiaobao Wei; Chengwen Luo; Jianqiang Li; and Zhengzhe Liu

arXiv:2604.10125·cs.CV·April 14, 2026

PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization

Dongli Wu, Jingyu Hu, Ka-Hei Hui, Xiaobao Wei, Chengwen Luo, Jianqiang Li, and Zhengzhe Liu

PDF

TL;DR

This paper introduces PhyMix, a novel framework that enhances the physical plausibility of single-image 3D indoor scene generation by integrating a physics evaluator into training and inference.

Contribution

It presents a unified physics evaluator benchmark and a new method combining implicit alignment and explicit refinement for physically consistent scene generation.

Findings

01

State-of-the-art scenes are largely physics-unaware.

02

PhyMix improves physical plausibility and visual fidelity.

03

Extensive evaluations confirm superior performance.

Abstract

Existing single-image 3D indoor scene generators often produce results that look visually plausible but fail to obey real-world physics, limiting their reliability in robotics, embodied AI, and design. To examine this gap, we introduce a unified Physics Evaluator that measures four main aspects: geometric priors, contact, stability, and deployability, which are further decomposed into nine sub-constraints, establishing the first benchmark to measure physical consistency. Based on this evaluator, our analysis shows that state-of-the-art methods remain largely physics-unaware. To overcome this limitation, we further propose a framework that integrates feedback from the Physics Evaluator into both training and inference, enhancing the physical plausibility of generated scenes. Specifically, we propose PhyMix, which is composed of two complementary components: (i) implicit alignment via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.