SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning
Pawat Chunhachatrachai, Gueter Josmy Faure, Hung-Ting Su, Winston H. Hsu

TL;DR
SpatioRoute introduces a dynamic prompt routing method for zero-shot spatial reasoning in egocentric videos, improving accuracy without additional training or 3D data.
Contribution
It proposes a novel, training-free prompt routing approach with rule-based and LLM-driven modes for spatial question answering.
Findings
Achieves up to 5% accuracy improvement over fixed prompts.
Establishes new state-of-the-art for zero-shot spatial VQA.
Chain-of-Thought prompting degrades performance in this setting.
Abstract
Spatial question answering over egocentric video is a challenging task that requires Vision-Language Models (VLMs) to reason about 3D object positions, scene affordances, and directional relationships, particularly in the zero-shot setting where no task-specific fine-tuning is available. We introduce SpatioRoute, a dynamic prompt generation approach that routes each incoming question to a semantically tailored prompt template -- without any additional training, fine-tuning, or 3D sensor input. SpatioRoute operates in two complementary modes: SpatioRoute-R, a rule-based router that deterministically maps question typologies (e.g., What, Is, How, Can, Which) to specialized prompt templates; and SpatioRoute-L, an LLM-driven approach that generates task-specific prompts from the question and situational context alone, with no video input at routing time. We evaluate SpatioRoute on the SQA3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
