FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
Yuanqing Liu, Ziming Yang, Yulong Li, Yue Yang

TL;DR
FloorplanVLM introduces a novel vision-language model that converts raster floorplans into structured vector graphics using sequence modeling, achieving high accuracy and generalization on complex layouts.
Contribution
The paper presents a unified 'pixels-to-sequence' approach for floorplan vectorization, along with a large-scale dataset and a new benchmark for evaluation.
Findings
Achieves 92.52% external-wall IoU on benchmark
Demonstrates robust generalization to non-Manhattan layouts
Outperforms pixel-based and query-based methods
Abstract
Converting raster floorplans into engineering-grade vector graphics is challenging due to complex topology and strict geometric constraints. To address this, we present FloorplanVLM, a unified framework that reformulates floorplan vectorization as an image-conditioned sequence modeling task. Unlike pixel-based methods that rely on fragile heuristics or query-based transformers that generate fragmented rooms, our model directly outputs structured JSON sequences representing the global topology. This 'pixels-to-sequence' paradigm enables the precise and holistic constraint satisfaction of complex geometries, such as slanted walls and curved arcs. To support this data-hungry approach, we introduce a scalable data engine: we construct a large-scale dataset (Floorplan-2M) and a high-fidelity subset (Floorplan-HQ-300K) to balance geometric diversity and pixel-level precision. We then employ a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computational Geometry and Mesh Generation · VLSI and FPGA Design Techniques
