CrossA11y: Identifying Video Accessibility Issues via Cross-modal   Grounding

Xingyu "Bruce" Liu; Ruolin Wang; Dingzeyu Li; Xiang 'Anthony' Chen,; Amy Pavel

arXiv:2208.11144·cs.HC·February 19, 2025

CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding

Xingyu "Bruce" Liu, Ruolin Wang, Dingzeyu Li, Xiang 'Anthony' Chen,, Amy Pavel

PDF

TL;DR

CrossA11y is a system that automatically detects visual and auditory accessibility issues in videos using cross-modal analysis, simplifying the process for authors to add descriptions and captions effectively.

Contribution

This paper introduces CrossA11y, a novel system leveraging cross-modal grounding to automatically identify accessibility issues in videos, improving efficiency over manual methods.

Findings

01

Effective detection of accessibility issues demonstrated in a lab study

02

Participants found CrossA11y intuitive and helpful for editing videos

03

Compared to baseline, CrossA11y improved accessibility issue identification

Abstract

Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.