How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People

Ricardo E. Gonzalez Penuela; Crescentia Jung; Sharon Y Lin; Ruiying Hu; Shiri Azenkot

arXiv:2602.13469·cs.HC·February 20, 2026

How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People

Ricardo E. Gonzalez Penuela, Crescentia Jung, Sharon Y Lin, Ruiying Hu, Shiri Azenkot

PDF

Open Access

TL;DR

This study evaluates how multimodal large language models assist blind and low vision individuals in accessing visual information through a diary study, highlighting their strengths and limitations in real-world use.

Contribution

It introduces the 'visual assistant' skill concept and provides guidelines to enhance MLLM applications for BLV users based on real-world usage data.

Findings

01

Participants found the AI trustworthy and satisfying despite some incorrect answers.

02

MLLMs improve descriptive accuracy of visual information.

03

Supporting daily use requires reliable, goal-directed assistance.

Abstract

Multimodal large language models (MLLMs) are changing how Blind and Low Vision (BLV) people access visual information. Unlike traditional visual interpretation tools that only provide descriptions, MLLM-enabled applications offer conversational assistance, where users can ask questions to obtain goal-relevant details. However, evidence about their performance in the real-world and implications for BLV people's daily lives remains limited. To address this, we conducted a two-week diary study, where we captured 20 BLV participants' use of an MLLM-enabled visual interpretation application. Although participants rated the visual interpretations of the application as "trustworthy" (mean=3.76 out of 5, max=extremely trustworthy) and "somewhat satisfying" (mean=4.13 out of 5, max=very satisfying), the AI often produced incorrect answers (22.2%) or abstained (10.8%) from responding to users'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI in Service Interactions · Artificial Intelligence in Healthcare and Education