Speaker Naming in Movies

Mahmoud Azab; Mingzhe Wang; Max Smith; Noriyuki Kojima; Jia Deng; Rada; Mihalcea

arXiv:1809.08761·cs.CL·September 25, 2018·1 cites

Speaker Naming in Movies

Mahmoud Azab, Mingzhe Wang, Max Smith, Noriyuki Kojima, Jia Deng, Rada, Mihalcea

PDF

Open Access

TL;DR

This paper introduces a multimodal model for speaker naming in movies that combines visual, textual, and acoustic data, demonstrating superior performance on a new dataset and achieving state-of-the-art results in subtitle-based question answering.

Contribution

The paper presents a novel unified optimization framework for multimodal speaker naming and an end-to-end memory network that advances the state-of-the-art in movie subtitle understanding.

Findings

01

Significant improvement over baselines in speaker naming accuracy

02

New dataset with TV show and movie episodes for evaluation

03

State-of-the-art results on MovieQA 2017 Challenge subtitles task

Abstract

We propose a new model for speaker naming in movies that leverages visual, textual, and acoustic modalities in an unified optimization framework. To evaluate the performance of our model, we introduce a new dataset consisting of six episodes of the Big Bang Theory TV show and eighteen full movies covering different genres. Our experiments show that our multimodal model significantly outperforms several competitive baselines on the average weighted F-score metric. To demonstrate the effectiveness of our framework, we design an end-to-end memory network model that leverages our speaker naming model and achieves state-of-the-art results on the subtitles task of the MovieQA 2017 Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsMemory Network