Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Yerim Jeon, Miso Lee, WonJun Moon, and Jae-Pil Heo

TL;DR
This paper introduces 3D-SLIM, a novel masking strategy for LLMs that enhances 3D scene-language understanding by aligning attention mechanisms with spatial structures, significantly improving reasoning capabilities without extra parameters.
Contribution
The paper proposes 3D-SLIM, an adaptive attention masking method that replaces causal masks with spatially-aware masks, enabling better 3D reasoning in LLMs without architectural changes.
Findings
Substantial performance improvements across multiple benchmarks.
Effective spatially-aware attention without additional parameters.
Validation across diverse 3D scene-language tasks.
Abstract
Recent advances in 3D scene-language understanding have leveraged Large Language Models (LLMs) for 3D reasoning by transferring their general reasoning ability to 3D multi-modal contexts. However, existing methods typically adopt standard decoders from language modeling, which rely on a causal attention mask. This design introduces two fundamental conflicts in 3D scene understanding: sequential bias among order-agnostic 3D objects and restricted object-instruction attention, hindering task-specific reasoning. To overcome these limitations, we propose 3D Spatial Language Instruction Mask (3D-SLIM), an effective masking strategy that replaces the causal mask with an adaptive attention mask tailored to the spatial structure of 3D scenes. Our 3D-SLIM introduces two key components: a Geometry-adaptive Mask that constrains attention based on spatial density rather than token order, and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · 3D Shape Modeling and Analysis
