Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
Chongyang Li, Zhiqiang Yuan, Hanbo Bi, Zexi Jia, Jinchao Zhang

TL;DR
This paper introduces WalkVLM-LR, a vision language model designed for walking assistance that reduces output and temporal redundancy, improving efficiency and informativeness for visually impaired users.
Contribution
The paper presents a novel model with human-preference-based rewards and an environment awareness discriminator to enhance conciseness and reduce redundancy in walking assistance systems.
Findings
Achieves state-of-the-art performance in output conciseness.
Effectively reduces temporal redundancy in scene assessment.
Improves environmental risk assessment for visually impaired users.
Abstract
Approximately 283 million people worldwide live with visual impairments, motivating increasing research into leveraging Visual Language Models (VLMs) to develop effective walking assistance systems for blind and low vision individuals. However, existing VLMs in walking assistant task often have outputs that contain considerable redundancy and extraneous details, adversely affecting users' ability to accurately assess their surroundings. Moreover, these models typically lack the capability to proactively assess environmental risks and adaptively trigger reminders based on the appropriate scene, leading to excessive temporal redundancy. To mitigate output and temporal redundancy, we propose WalkVLM-LR, a walking assistance model with less redundancy. To reduce output redundancy, we introduce four human-preference-based custom reward functions within the GRPO-based reasoning framework to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
