SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation
Yun He, Kelin Yu, Matthias Zwicker

TL;DR
SceneOrchestra introduces a trainable framework for 3D scene synthesis that optimizes tool-call workflows, removing review loops, and achieves higher quality with less runtime.
Contribution
It presents a novel trainable orchestration method that generates full tool-call trajectories, improving efficiency and scene quality over heuristic-based approaches.
Findings
Achieves state-of-the-art scene quality.
Reduces runtime compared to previous methods.
Eliminates the need for step-by-step review loops.
Abstract
Recent agentic frameworks for 3D scene synthesis have advanced realism and diversity by integrating heterogeneous generation and editing tools. These tools are organized into workflows orchestrated by an off-the-shelf LLM. Current approaches typically adopt an execute-review-reflect loop: at each step, the orchestrator executes a tool, renders intermediate results for review, and then decides on the tool and its parameters for the next step. However, this design has two key limitations. First, next-step tool selection and parameter configuration are driven by heuristic rules, which can lead to suboptimal execution flows, unnecessary tool invocations, degraded output quality, and increased runtime. Second, rendering and reviewing intermediate results after each step introduces additional latency. To address these issues, we propose SceneOrchestra, a trainable orchestration framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
