How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People
Ricardo E. Gonzalez Penuela, Crescentia Jung, Sharon Y Lin, Ruiying Hu, Shiri Azenkot

TL;DR
This study evaluates how multimodal large language models assist blind and low vision individuals in accessing visual information through a diary study, highlighting their strengths and limitations in real-world use.
Contribution
It introduces the 'visual assistant' skill concept and provides guidelines to enhance MLLM applications for BLV users based on real-world usage data.
Findings
Participants found the AI trustworthy and satisfying despite some incorrect answers.
MLLMs improve descriptive accuracy of visual information.
Supporting daily use requires reliable, goal-directed assistance.
Abstract
Multimodal large language models (MLLMs) are changing how Blind and Low Vision (BLV) people access visual information. Unlike traditional visual interpretation tools that only provide descriptions, MLLM-enabled applications offer conversational assistance, where users can ask questions to obtain goal-relevant details. However, evidence about their performance in the real-world and implications for BLV people's daily lives remains limited. To address this, we conducted a two-week diary study, where we captured 20 BLV participants' use of an MLLM-enabled visual interpretation application. Although participants rated the visual interpretations of the application as "trustworthy" (mean=3.76 out of 5, max=extremely trustworthy) and "somewhat satisfying" (mean=4.13 out of 5, max=very satisfying), the AI often produced incorrect answers (22.2%) or abstained (10.8%) from responding to users'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI in Service Interactions · Artificial Intelligence in Healthcare and Education
