How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
Luyu Yang, Yutong Dai, An Yan, Viraj Prabhu, Ran Xu, Zeyuan Chen

TL;DR
This paper introduces DreamHouse, a benchmark for evaluating vision-language models' ability to generate structurally and physically valid 3D architectural artifacts, emphasizing physical reasoning over perceptual realism.
Contribution
The paper presents DreamHouse, a novel benchmark with a structured validation framework for physical generative reasoning in architectural design, grounded in real-world construction standards.
Findings
State-of-the-art VLMs show significant gaps in physical reasoning capabilities.
Existing models excel at perceptual realism but struggle with structural and code-compliant generation.
Physical validity is identified as a crucial but underexplored aspect of multimodal AI evaluation.
Abstract
The physical world is not merely visual; it is governed by rigorous structural and procedural constraints. Yet, the evaluation of vision-language models (VLMs) remains heavily skewed toward perceptual realism, prioritizing the generation of visually plausible 3D layouts, shapes, and appearances. Current benchmarks rarely test whether models grasp the step-by-step processes and physical dependencies required to actually build these artifacts, a capability essential for automating design-to-construction pipelines. To address this, we introduce DreamHouse, a novel benchmark for physical generative reasoning: the capacity to synthesize artifacts that concurrently satisfy geometric, structural, constructability, and code-compliance constraints. We ground this benchmark in residential timber-frame construction, a domain with fully codified engineering standards and objectively verifiable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArchitecture and Computational Design · Design Education and Practice · BIM and Construction Integration
