SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao; Junhao Zhong; Chuan Fang; Jia Zheng; Rui Tang; Hao Zhu; Ping Tan; Zihan Zhou

arXiv:2506.07491·cs.CV·November 6, 2025

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, Zihan Zhou

PDF

Open Access 2 Models 3 Datasets 1 Video

TL;DR

SpatialLM is a large language model trained on synthetic 3D indoor scene data, achieving state-of-the-art layout estimation and competitive 3D object detection, advancing spatial understanding for AR and robotics.

Contribution

It introduces a standard multimodal LLM architecture for 3D scene understanding, trained on a large synthetic dataset, with improved performance over prior task-specific models.

Findings

01

State-of-the-art in layout estimation

02

Competitive results in 3D object detection

03

Demonstrates feasibility of LLMs for spatial understanding

Abstract

SpatialLM is a large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object boxes with their semantic categories. Unlike previous methods which exploit task-specific network designs, our model adheres to the standard multimodal LLM architecture and is fine-tuned directly from open-source LLMs. To train SpatialLM, we collect a large-scale, high-quality synthetic dataset consisting of the point clouds of 12,328 indoor scenes (54,778 rooms) with ground-truth 3D annotations, and conduct a careful study on various modeling and training decisions. On public benchmarks, our model gives state-of-the-art performance in layout estimation and competitive results in 3D object detection. With that, we show a feasible path for enhancing the spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

SpatialLM: Training Large Language Models for Structured Indoor Modeling· slideslive

Taxonomy

Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization