Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People
Zain Merchant, Abrar Anwar, Emily Wang, Souti Chattopadhyay, Jesse, Thomason

TL;DR
This paper develops a dataset and explores how large pretrained language models can generate contextually relevant navigation instructions to assist blind and low-vision individuals in unfamiliar environments, validated through user studies.
Contribution
It introduces a new dataset of images and goals for navigation scenarios and demonstrates the effectiveness of language models in generating useful instructions for BLV users.
Findings
Language models produce correct, useful instructions.
User studies show perceived benefits for BLV navigation.
Insights into user preferences for different scenarios.
Abstract
Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instances. Through a sighted user study, we demonstrate that large pretrained language models can produce correct and useful instructions perceived as beneficial for BLV users. We also conduct a survey and interview with 4 BLV users and observe useful insights on preferences for different instructions based on the scenario.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Digital Accessibility for Disabilities
