SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

Pawat Chunhachatrachai; Gueter Josmy Faure; Hung-Ting Su; Winston H. Hsu

arXiv:2605.18209·cs.CV·May 19, 2026

SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

Pawat Chunhachatrachai, Gueter Josmy Faure, Hung-Ting Su, Winston H. Hsu

PDF

TL;DR

SpatioRoute introduces a dynamic prompt routing method for zero-shot spatial reasoning in egocentric videos, improving accuracy without additional training or 3D data.

Contribution

It proposes a novel, training-free prompt routing approach with rule-based and LLM-driven modes for spatial question answering.

Findings

01

Achieves up to 5% accuracy improvement over fixed prompts.

02

Establishes new state-of-the-art for zero-shot spatial VQA.

03

Chain-of-Thought prompting degrades performance in this setting.

Abstract

Spatial question answering over egocentric video is a challenging task that requires Vision-Language Models (VLMs) to reason about 3D object positions, scene affordances, and directional relationships, particularly in the zero-shot setting where no task-specific fine-tuning is available. We introduce SpatioRoute, a dynamic prompt generation approach that routes each incoming question to a semantically tailored prompt template -- without any additional training, fine-tuning, or 3D sensor input. SpatioRoute operates in two complementary modes: SpatioRoute-R, a rule-based router that deterministically maps question typologies (e.g., What, Is, How, Can, Which) to specialized prompt templates; and SpatioRoute-L, an LLM-driven approach that generates task-specific prompts from the question and situational context alone, with no video input at routing time. We evaluate SpatioRoute on the SQA3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.