Urban Risk-Aware Navigation via VQA-Based Event Maps for People with Low Vision
Antoni Valls, Jordi Sanchez-Riera

TL;DR
This paper introduces a VQA-based event map framework using Vision-Language Models to enhance urban navigation safety for people with low vision, supported by a new diverse dataset and benchmarking of multiple models.
Contribution
It presents a novel hierarchical VQA-based event map system for hazard detection, along with a large, diverse dataset and a comprehensive benchmark of state-of-the-art models.
Findings
MLLMs outperform classification-based approaches in hazard detection.
Qwen-VL achieves the best balance of precision and recall.
The framework enables flexible, risk-aware urban navigation for visually impaired users.
Abstract
Visual impairment affects hundreds of millions of people worldwide, severely limiting their ability to navigate urban environments safely and independently. While wearable assistive devices offer a promising platform for real-time hazard detection, existing approaches rely on task-specific vision pipelines that lack flexibility and generalizability. In this work, we propose an event map framework based on visual question answering that leverages Vision-Language Models (VLMs) for pedestrian scene description and hazard identification across diverse real-world environments, using a three-level hierarchical query structure to enable fine-grained scene understanding without task-specific retraining. Model responses are aggregated into a weighted risk scoring system that maps street segments into four discrete safety categories, producing navigable risk-aware event maps for route planning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
