Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models
Jingyi Xie, Rui Yu, He Zhang, Syed Masum Billah, Sooyeon Lee, John M., Carroll

TL;DR
This study explores how large multimodal models assist visually impaired users, revealing their capabilities and limitations in real-world social and personal contexts, and proposing design improvements for future assistive AI tools.
Contribution
It provides a detailed analysis of LMM-based visual assistance in daily life, highlighting key limitations and suggesting strategies for enhancing their effectiveness and personalization.
Findings
LMMs often hallucinate social and contextual details.
They struggle to interpret user intentions accurately.
Design strategies can improve interaction quality.
Abstract
Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Interactive and Immersive Displays · Multimedia Communication and Technology
