Coarse Semantic Injection for LLM-Conditioned Structured Indoor Prediction

Shuliang Zhu; Tomiwa Adey; Jinjia Zhou

arXiv:2605.16832·cs.CV·May 19, 2026

Coarse Semantic Injection for LLM-Conditioned Structured Indoor Prediction

Shuliang Zhu, Tomiwa Adey, Jinjia Zhou

PDF

TL;DR

This paper introduces a semantic augmentation method for LLM-based indoor scene understanding, improving detection of structural elements and furniture in cluttered 3D point cloud data.

Contribution

It proposes a novel RGBB color coding scheme for semantic information integration into point cloud tokenization, enhancing LLM-conditioned decoding accuracy.

Findings

01

Improved metrics for opening localization and furniture detection.

02

Semantic augmentation benefits across multiple datasets.

03

Ablation studies clarify the roles of semantic source and color coding.

Abstract

Large language models (LLMs) have recently been used as structured decoders for indoor understanding from 3D point-token inputs. However, point cloud encoders often under-represent thin structural elements such as doors and windows after voxelization and sparse pooling, and may miss individual furniture instances in cluttered scenes. We propose an interface-preserving semantic augmentation for LLM-conditioned structured decoding. The key idea is to associate semantic evidence with the point-cloud representation, reduce it to a coarse four-group code (furniture, walls, openings, and others), and encode it as an RGBB point interface: red for furniture, green for walls, blue for openings, and black for others, where RGBB denotes four semantic color states represented in three RGB channels rather than an additional fourth channel. This semantic color code is appended to the original raw…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.