ADCanvas: Accessible and Conversational Audio Description Authoring for Blind and Low Vision Creators
Franklin Mingzhe Li, Michael Xieyang Liu, Cynthia L. Bennett, Shaun K. Kane

TL;DR
ADCanvas is a multimodal, accessible audio description authoring tool that enables blind and low vision creators to produce, modify, and interact with visual media content through conversational and non-visual interfaces.
Contribution
It introduces ADCanvas, a novel multimodal system combining conversational AI and accessible controls for end-to-end audio description creation by BLV creators.
Findings
Users adopt the system as an informational and drafting aide.
Participants maintain agency through verification and editing.
The system supports live visual question answering and script generation.
Abstract
Audio Description (AD) provides essential access to visual media for blind and low vision (BLV) audiences. Yet current AD production tools remain largely inaccessible to BLV video creators, who possess valuable expertise but face barriers due to visually-driven interfaces. We present ADCanvas, a multimodal authoring system that supports non-visual control over audio description (AD) creation. ADCanvas combines conversational interaction with keyboard-based playback control and a plain-text, screen reader-accessible editor to support end-to-end AD authoring and visual question answering (VQA). Combining screen-reader-friendly controls with a multimodal LLM agent, ADCanvas supports live VQA, script generation, and AD modification. Through a user study with 12 BLV video creators, we find that users adopt the conversational agent as an informational aide and drafting assistant, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Multimodal Machine Learning Applications · Tactile and Sensory Interactions
