WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
Zhiqiang Yuan, Ting Zhang, Ying Deng, Jiapei Zhang, Yeshuang Zhu, Zexi, Jia, Jie Zhou, Jinchao Zhang

TL;DR
This paper introduces WalkVLM, a vision-language model designed to assist visually impaired individuals by providing real-time walking guidance, supported by a new large-scale dataset and a benchmark for evaluating such systems.
Contribution
The paper presents the first large-scale walking assistance dataset and a novel WalkVLM model with hierarchical planning and temporal-aware prediction for improved guidance.
Findings
WalkVLM outperforms other VLMs in stream video processing tasks.
The dataset enables standardized training and evaluation for walking assistance.
WalkVLM generates concise, informative reminders effectively.
Abstract
Approximately 200 million individuals around the world suffer from varying degrees of visual impairment, making it crucial to leverage AI technology to offer walking assistance for these people. With the recent progress of vision-language models (VLMs), applying VLMs to offer walking guidance has become popular. However, the existing methods of walking guidance are mainly based on self-curated question-answering datasets that are not publicly accessible, without a standardized benchmark for training or evaluation. Moreover, walking assistance often requires real-time streaming video analysis and the generation of concise yet informative reminders, making VLMs struggle due to excessive responses and low efficiency in inferences. In this paper, we introduce the first large-scale dataset dedicated to walking assistance, comprising 12,000 video-annotation pairs, to provide a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions
