LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico, Tombari, Manling Li, Nick Haber, Jiajun Wu

TL;DR
LayoutVLM leverages vision-language models to generate and optimize 3D object layouts from natural language instructions, ensuring physical plausibility and improved spatial reasoning.
Contribution
The paper introduces a novel framework that combines VLMs with differentiable optimization for 3D scene layout generation based on language instructions.
Findings
Produces physically plausible 3D layouts aligned with semantic intent
Addresses limitations of existing language and constraint-based methods
Fine-tuning VLMs enhances reasoning performance
Abstract
Spatial reasoning is a fundamental aspect of human cognition, enabling intuitive understanding and manipulation of objects in three-dimensional space. While foundation models demonstrate remarkable performance on some benchmarks, they still struggle with 3D reasoning tasks like arranging objects in space according to open-ended language instructions, particularly in dense and physically constrained environments. We introduce LayoutVLM, a framework and scene layout representation that exploits the semantic knowledge of Vision-Language Models (VLMs) and supports differentiable optimization to ensure physical plausibility. LayoutVLM employs VLMs to generate two mutually reinforcing representations from visually marked images, and a self-consistent decoding process to improve VLMs spatial planning. Our experiments show that LayoutVLM addresses the limitations of existing LLM and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Manufacturing Process and Optimization · 3D Shape Modeling and Analysis
