CATNAV: Cached Vision-Language Traversability for Efficient Zero-Shot Robot Navigation

Aditya Potnis; Francisco Affonso; Shreya Gummadi; Naveen Kumar Uppalapati; Girish Chowdhary

arXiv:2603.22800·cs.RO·March 25, 2026

CATNAV: Cached Vision-Language Traversability for Efficient Zero-Shot Robot Navigation

Aditya Potnis, Francisco Affonso, Shreya Gummadi, Naveen Kumar Uppalapati, Girish Chowdhary

PDF

Open Access

TL;DR

CATNAV is a novel framework that uses multimodal large language models and a visuosemantic cache to enable efficient, embodiment-aware zero-shot robot navigation in unstructured environments, significantly reducing computational costs and improving safety.

Contribution

It introduces a cost-aware, zero-shot navigation method leveraging multimodal LLMs and a visuosemantic cache for risk assessment and path selection without task-specific training.

Findings

01

Achieves 10% higher goal-reaching rate

02

Reduces behavioral violations by 33%

03

Cuts online VLM queries by 85.7%

Abstract

Navigating unstructured environments requires assessing traversal risk relative to a robot's physical capabilities, a challenge that varies across embodiments. We present CATNAV, a cost-aware traversability navigation framework that leverages multimodal LLMs for zero-shot, embodiment-aware costmap generation without task-specific training. We introduce a visuosemantic caching mechanism that detects scene novelty and reuses prior risk assessments for semantically similar frames, reducing online VLM queries by 85.7%. Furthermore, we introduce a VLM-based trajectory selection module that evaluates proposals through visual reasoning to choose the safest path given behavioral constraints. We evaluate CATNAV on a quadruped robot across indoor and outdoor unstructured environments, comparing against state-of-the-art vision-language-action baselines. Across five navigation tasks, CATNAV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotics and Sensor-Based Localization