Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

Sagar Bharadwaj; Ziyong Ma; Anurag Ghosh; Srinivasan Seshan; Anthony Rowe

arXiv:2605.09218·cs.CV·May 12, 2026

Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

Sagar Bharadwaj, Ziyong Ma, Anurag Ghosh, Srinivasan Seshan, Anthony Rowe

PDF

TL;DR

Flame3D is a training-free framework enabling zero-shot, compositional reasoning about 3D scenes by using external tools and large language models, without requiring 3D-specific training.

Contribution

It introduces Flame3D, a novel inference-time approach that constructs editable 3D scene memories and synthesizes spatial programs for open-ended reasoning.

Findings

01

Competitive performance on ScanQA without training

02

Essential role of synthesized spatial operations in reasoning

03

Effective reasoning over layouts and objects not in the scene

Abstract

3D scene understanding spans reasoning about free space, object grounding, hypothetical object insertions, complex geometric relationships, and integrating all of these with external tools and data sources. Existing 3D understanding methods typically rely on large-scale 3D-language training or focus on object grounding and simple spatial relationships. We argue that the broad generalization that motivates 3D-language training can be achieved at inference time, without 3D-specific training. We propose Flame3D, a training-free framework that represents scenes as editable visual-textual 3D memories and exposes them to an off-the-shelf MLLM through composable spatial tools. Flame3D also lets the agent synthesize custom spatial programs at inference time, enabling open-ended reasoning over layouts, empty space, and objects not yet present in the scene. External data and corrections can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.