ADCanvas: Accessible and Conversational Audio Description Authoring for Blind and Low Vision Creators

Franklin Mingzhe Li; Michael Xieyang Liu; Cynthia L. Bennett; Shaun K. Kane

arXiv:2602.07266·cs.HC·February 10, 2026

ADCanvas: Accessible and Conversational Audio Description Authoring for Blind and Low Vision Creators

Franklin Mingzhe Li, Michael Xieyang Liu, Cynthia L. Bennett, Shaun K. Kane

PDF

Open Access

TL;DR

ADCanvas is a multimodal, accessible audio description authoring tool that enables blind and low vision creators to produce, modify, and interact with visual media content through conversational and non-visual interfaces.

Contribution

It introduces ADCanvas, a novel multimodal system combining conversational AI and accessible controls for end-to-end audio description creation by BLV creators.

Findings

01

Users adopt the system as an informational and drafting aide.

02

Participants maintain agency through verification and editing.

03

The system supports live visual question answering and script generation.

Abstract

Audio Description (AD) provides essential access to visual media for blind and low vision (BLV) audiences. Yet current AD production tools remain largely inaccessible to BLV video creators, who possess valuable expertise but face barriers due to visually-driven interfaces. We present ADCanvas, a multimodal authoring system that supports non-visual control over audio description (AD) creation. ADCanvas combines conversational interaction with keyboard-based playback control and a plain-text, screen reader-accessible editor to support end-to-end AD authoring and visual question answering (VQA). Combining screen-reader-friendly controls with a multimodal LLM agent, ADCanvas supports live VQA, script generation, and AD modification. Through a user study with 12 BLV video creators, we find that users adopt the conversational agent as an informational aide and drafting assistant, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Multimodal Machine Learning Applications · Tactile and Sensory Interactions