Active Speakers in Context

Juan Leon Alcazar; Fabian Caba Heilbron; Long Mai; Federico Perazzi,; Joon-Young Lee; Pablo Arbelaez; and Bernard Ghanem

arXiv:2005.09812·cs.CV·May 21, 2020

Active Speakers in Context

Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi,, Joon-Young Lee, Pablo Arbelaez, and Bernard Ghanem

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Active Speaker Context, a novel long-term multi-speaker modeling approach that significantly improves active speaker detection accuracy in multi-person scenarios by leveraging pairwise and temporal relations.

Contribution

The paper proposes Active Speaker Context, a new representation that models relationships between multiple speakers over time, enhancing detection performance beyond existing short-term methods.

Findings

01

Achieves 87.1% mAP on AVA-ActiveSpeaker dataset.

02

Structured feature ensemble benefits detection performance.

03

Long-term multi-speaker analysis improves accuracy.

Abstract

Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. Although this strategy can be enough for addressing single-speaker scenarios, it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our Active Speaker Context is designed to learn pairwise and temporal relations from an structured ensemble of audio-visual observations. Our experiments show that a structured feature ensemble already benefits the active speaker detection performance. Moreover, we find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving a mAP of 87.1%. We present ablation studies that verify that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuankarion/active-speakers-context
pytorchOfficial

Videos

Active Speakers in Context· youtube

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing