Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions

Meng Chen; Akhil Iyer; Amy Pavel

arXiv:2507.15692·cs.HC·July 22, 2025

Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions

Meng Chen, Akhil Iyer, Amy Pavel

PDF

TL;DR

This paper introduces a system that presents multiple variations of MLLM-generated image descriptions to help blind and low vision users better detect unreliable information, improving safety and decision-making.

Contribution

It proposes a novel design space and prototype for surfacing description variations, supported by a user study demonstrating improved reliability detection.

Findings

01

Variations increased detection of unreliable claims by 4.9x

02

Participants preferred seeing multiple descriptions over a single one

03

All participants showed interest in using the system for real-world tasks

Abstract

Multimodal large language models (MLLMs) provide new opportunities for blind and low vision (BLV) people to access visual information in their daily lives. However, these models often produce errors that are difficult to detect without sight, posing safety and social risks in scenarios from medication identification to outfit selection. While BLV MLLM users use creative workarounds such as cross-checking between tools and consulting sighted individuals, these approaches are often time-consuming and impractical. We explore how systematically surfacing variations across multiple MLLM responses can support BLV users to detect unreliable information without visually inspecting the image. We contribute a design space for eliciting and presenting variations in MLLM descriptions, a prototype system implementing three variation presentation styles, and findings from a user study with 15 BLV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.