CATNAV: Cached Vision-Language Traversability for Efficient Zero-Shot Robot Navigation
Aditya Potnis, Francisco Affonso, Shreya Gummadi, Naveen Kumar Uppalapati, Girish Chowdhary

TL;DR
CATNAV is a novel framework that uses multimodal large language models and a visuosemantic cache to enable efficient, embodiment-aware zero-shot robot navigation in unstructured environments, significantly reducing computational costs and improving safety.
Contribution
It introduces a cost-aware, zero-shot navigation method leveraging multimodal LLMs and a visuosemantic cache for risk assessment and path selection without task-specific training.
Findings
Achieves 10% higher goal-reaching rate
Reduces behavioral violations by 33%
Cuts online VLM queries by 85.7%
Abstract
Navigating unstructured environments requires assessing traversal risk relative to a robot's physical capabilities, a challenge that varies across embodiments. We present CATNAV, a cost-aware traversability navigation framework that leverages multimodal LLMs for zero-shot, embodiment-aware costmap generation without task-specific training. We introduce a visuosemantic caching mechanism that detects scene novelty and reuses prior risk assessments for semantically similar frames, reducing online VLM queries by 85.7%. Furthermore, we introduce a VLM-based trajectory selection module that evaluates proposals through visual reasoning to choose the safest path given behavioral constraints. We evaluate CATNAV on a quadruped robot across indoor and outdoor unstructured environments, comparing against state-of-the-art vision-language-action baselines. Across five navigation tasks, CATNAV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotics and Sensor-Based Localization
