LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
Yi Zhao, Siqi Wang, Jing Li

TL;DR
This paper introduces LaF-GRPO, a novel method using large language models to generate precise navigation instructions for visually impaired individuals, improving accuracy and safety while reducing data collection efforts.
Contribution
It proposes LaF-GRPO, a new LLM-based approach for in-situ navigation instruction generation, and introduces NIG4VI, a comprehensive dataset for training and evaluation.
Findings
LaF-GRPO significantly improves BLEU and METEOR scores.
The method produces more intuitive and safer instructions.
The approach reduces the need for costly real-world data collection.
Abstract
Navigation instruction generation for visually impaired (VI) individuals (NIG-VI) is critical yet relatively underexplored. This study focuses on generating precise, in-situ, step-by-step navigation instructions that are practically usable for VI users. Specifically, we propose LaF-GRPO (LLM-as-Follower GRPO), where an LLM simulates VI user responses to navigation instructions, thereby providing feedback rewards to guide the post-training of a Vision-Language Model (VLM). This enhances instruction accuracy and usability while reducing costly real-world data collection needs. To address the scarcity of dedicated benchmarks in this field, we introduce NIG4VI, a 27k-sample open-source dataset to facilitate training and evaluation. It comprises diverse navigation scenarios with accurate spatial coordinates, supporting detailed and open-ended in-situ instruction generation. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTactile and Sensory Interactions · Multimodal Machine Learning Applications · Advanced Neural Network Applications
