Making Videos Accessible for Blind and Low Vision Users Using a Multimodal Agent Video Player
Adriana Olmos, Anoop K. Sinha, Renelito Delos Santos, Ruben Rodriguez Rodriguez, James A. Landay, Sam S. Sepah, Philip Nelson, Shaun K. Kane

TL;DR
This paper presents a multimodal agent video player that enhances video accessibility for blind and low-vision users through interactive dialogue, fostering independence, trust, and personalized control over the viewing experience.
Contribution
Introduction of a novel multimodal large language model-based agent that provides interactive, accessible video experiences tailored to BLV users' needs.
Findings
BLV users value independence and personal agency in video accessibility.
The MAVP's conversational interface fosters trust and collaboration.
Meta-conversational dialogues about AI limitations help repair trust.
Abstract
Video content remains largely inaccessible to blind and low-vision (BLV) users. To address this, we introduce a prototype that leverages a multimodal agent - powered by a novel conversational architecture using a multimodal large language model (MLLM) - to provide BLV users with an interactive, accessible video experience. This Multimodal Agent Video Player (MAVP) demonstrates that an interactive accessibility mode can be added to a video through multilayered prompt orchestration. We describe a user-centered design process involving 18 sessions with BLV users that showed that BLV users do not just want accessibility features, but desire independence and personal agency over the viewing experience. We conducted a qualitative study with an additional 8 BLV participants; in this, we saw that the MAVP's conversational dialogue offers BLV users a sense of personal agency, fostering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Social Robot Interaction and HRI · Subtitles and Audiovisual Media
