Context-Nav: Context-Driven Exploration and Viewpoint-Aware 3D Spatial Reasoning for Instance Navigation
Won Shik Jang, Ue-Hwan Kim

TL;DR
Context-Nav introduces a geometry-grounded, viewpoint-aware 3D spatial reasoning approach for text-goal instance navigation, improving exploration and disambiguation without task-specific training.
Contribution
It presents a novel framework that combines global exploration guided by dense text-image alignments with 3D spatial verification, achieving state-of-the-art results without training.
Findings
State-of-the-art performance on InstanceNav and CoIN-Bench datasets.
Encoding full captions into the value map improves exploration efficiency.
Viewpoint-aware 3D verification reduces incorrect stopping in navigation.
Abstract
Text-goal instance navigation (TGIN) asks an agent to resolve a single, free-form description into actions that reach the correct object instance among same-category distractors. We present \textit{Context-Nav}, which elevates long, contextual captions from a local matching cue to a global exploration prior and verifies candidates through 3D spatial reasoning. First, we compute dense text-image alignments for a value map that ranks frontiers -- guiding exploration toward regions consistent with the entire description rather than early detections. Second, upon observing a candidate, we perform a viewpoint-aware relation check: the agent samples plausible observer poses, aligns local frames, and accepts a target only if the spatial relations can be satisfied from at least one viewpoint. The pipeline requires no task-specific training or fine-tuning; we attain state-of-the-art performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Robot Manipulation and Learning
