AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Tengda Han; Max Bain; Arsha Nagrani; G\"ul Varol; Weidi Xie; Andrew; Zisserman

arXiv:2310.06838·cs.CV·October 11, 2023·1 cites

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Tengda Han, Max Bain, Arsha Nagrani, G\"ul Varol, Weidi Xie, Andrew, Zisserman

PDF

Open Access 1 Video

TL;DR

This paper introduces a comprehensive model for automatic movie audio description that addresses who, when, and what questions by integrating character identification, temporal decision models, and a new vision-language architecture, improving description quality for visually impaired audiences.

Contribution

It presents a novel integrated approach combining character banks, temporal models, and a new vision-language architecture for improved automatic movie audio description.

Findings

01

Enhanced character naming accuracy in AD

02

Effective temporal interval selection for AD generation

03

Improved AD quality over previous models

Abstract

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents notable challenges -- AD must occur only during existing pauses in dialogue, should refer to characters by name, and ought to aid understanding of the storyline as a whole. To this end, we develop a new model for automatically generating movie AD, given CLIP visual features of the frames, the cast list, and the temporal locations of the speech; addressing all three of the 'who', 'when', and 'what' questions: (i) who -- we introduce a character bank consisting of the character's name, the actor that played the part, and a CLIP feature of their face, for the principal cast of each movie, and demonstrate how this can be used to improve naming in the generated AD; (ii) when -- we investigate several models for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AutoAD II: The Sequel – Who, When, and What in Movie Audio Description· youtube

Taxonomy

TopicsSubtitles and Audiovisual Media · Infrastructure Maintenance and Monitoring

MethodsContrastive Language-Image Pre-training