Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval
Davide Buoso, Luke Robinson, Giuseppe Averta, Philip Torr, Tim, Franzmeyer, Daniele De Martini

TL;DR
Select2Plan is a training-free high-level robot planning framework that leverages off-the-shelf VLMs, structured VQA, and ICL to enable adaptable navigation without extensive data or fine-tuning.
Contribution
The paper introduces Select2Plan, a novel training-free planning approach that utilizes VQA and ICL with VLMs for robot navigation, eliminating the need for task-specific training.
Findings
Improves navigation performance by approximately 50% in TPV scenarios.
Achieves comparable results to trained models in FPV scenarios with only 20 demonstrations.
Demonstrates adaptability across various scene types and sensing setups.
Abstract
This study explores the potential of off-the-shelf Vision-Language Models (VLMs) for high-level robot planning in the context of autonomous navigation. Indeed, while most of existing learning-based approaches for path planning require extensive task-specific training/fine-tuning, we demonstrate how such training can be avoided for most practical cases. To do this, we introduce Select2Plan (S2P), a novel training-free framework for high-level robot planning which completely eliminates the need for fine-tuning or specialised training. By leveraging structured Visual Question-Answering (VQA) and In-Context Learning (ICL), our approach drastically reduces the need for data collection, requiring a fraction of the task-specific data typically used by trained models, or even relying only on online data. Our method facilitates the effective use of a generally trained VLM in a flexible and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Multimodal Machine Learning Applications · Natural Language Processing Techniques
