A Multimodal Assistive System for Product Localization and Retrieval for People who are Blind or have Low Vision
Ligao Ruan, Giles Hamilton-Fletcher, Mahya Beheshti, Todd E Hudson, Maurizio Porfiri, John-Ross Rizzo

TL;DR
This paper introduces a multimodal wearable assistive system that combines object detection, vision-language models, and auditory guidance to help people who are blind or have low vision locate and retrieve products independently in shopping environments.
Contribution
The paper presents a novel multimodal system integrating detection, navigation, and correction modules specifically designed for assistive shopping for pBLV users.
Findings
Product detection accuracy near 100% at close range
Navigation accuracy up to 94.4% with vision-language models
Correction accuracy exceeds 86% under optimal conditions
Abstract
Shopping is a routine activity for sighted individuals, yet for people who are blind or have low vision (pBLV), locating and retrieving products in physical environments remains a challenge. This paper presents a multimodal wearable assistive system that integrates object detection with vision-language models to support independent product or item retrieval, with the goal of enhancing users'autonomy and sense of agency. The system operates through three phases: product search, which identifies target products using YOLO-World detection combined with embedding similarity and color histogram matching; product navigation, which provides spatialized sonification and VLM-generated verbal descriptions to guide users toward the target; and product correction, which verifies whether the user has reached the correct product and provides corrective feedback when necessary. Technical evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Gaze Tracking and Assistive Technology · Hand Gesture Recognition Systems
