ADx3: A Collaborative Workflow for High-Quality Accessible Audio Description

Lana Do; Shasta Ihorn; Charity Pitcher-Cooper; Juvenal Francisco Barajas; Gio Jung; Xuan Duy Anh Nguyen; Sanjay Mirani; and Ilmi Yoon

arXiv:2602.02684·cs.HC·February 4, 2026

ADx3: A Collaborative Workflow for High-Quality Accessible Audio Description

Lana Do, Shasta Ihorn, Charity Pitcher-Cooper, Juvenal Francisco Barajas, Gio Jung, Xuan Duy Anh Nguyen, Sanjay Mirani, and Ilmi Yoon

PDF

Open Access

TL;DR

ADx3 is a collaborative framework that combines AI and human input to produce high-quality audio descriptions for videos, improving accessibility for blind and low-vision audiences.

Contribution

It introduces a novel integrated workflow with vision-language models, user editing, and on-demand queries to enhance audio description quality and accessibility.

Findings

01

VLMs can generate acceptable descriptions with proper prompting

02

Human edits significantly improve description quality

03

Interactive queries help identify gaps in AI-generated descriptions

Abstract

Audio description (AD) makes video content accessible to blind and low-vision (BLV) audiences, but producing high-quality descriptions is resource-intensive. Automated AD offers scalability, and prior studies show human-in-the-loop editing and user queries effectively improve narration. We introduce ADx3, a novel framework integrating these three modules: GenAD, upgrading baseline description generation with modern vision-language models (VLMs) guided by accessibility-informed prompting; RefineAD, supporting BLV and sighted users to view and edit drafts through an inclusive interface; and AdaptAD, enabling on-demand user queries. We evaluated GenAD in a study where seven accessibility specialists reviewed VLM-generated descriptions using professional guidelines. Findings show that with tailored prompting, VLMs produce good descriptions meeting basic standards, but excellent descriptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Translation Studies and Practices · Text Readability and Simplification