Balancing Performance and Efficiency in Zero-shot Robotic Navigation
Dmytro Kuzmenko, Nadiya Shvai

TL;DR
This paper evaluates various vision-language models and modules for zero-shot robotic navigation, proposing an optimized solution that improves success rate and reduces memory usage in resource-limited settings.
Contribution
It introduces an optimized approach that balances navigation success and computational efficiency, outperforming baseline models with less memory.
Findings
Success rate increased by 1.55% over baseline
Achieved 2.3 times less video memory usage
Provided insights into resource-efficient deployment strategies
Abstract
We present an optimization study of the Vision-Language Frontier Maps (VLFM) applied to the Object Goal Navigation task in robotics. Our work evaluates the efficiency and performance of various vision-language models, object detectors, segmentation models, and multi-modal comprehension and Visual Question Answering modules. Using the and splits of Habitat-Matterport 3D dataset, we conduct experiments on a desktop with limited VRAM. We propose a solution that achieves a higher success rate (+1.55%) improving over the VLFM BLIP-2 baseline without substantial success-weighted path length loss while requiring less video memory. Our findings provide insights into balancing model performance and computational efficiency, suggesting effective deployment strategies for resource-limited environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Space Satellite Systems and Control · Inertial Sensor and Navigation
