TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive   Learning

Chaeyoung Jung; Suyeon Lee; Kihyun Nam; Kyeongha Rho; You Jin Kim,; Youngjoon Jang; Joon Son Chung

arXiv:2309.12306·cs.CV·September 22, 2023·1 cites

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You Jin Kim,, Youngjoon Jang, Joon Son Chung

PDF

Open Access 1 Repo

TL;DR

TalkNCE introduces a talk-aware contrastive loss that enhances active speaker detection by leveraging speech and facial movement correspondence, achieving state-of-the-art results without extra supervision.

Contribution

The paper proposes a novel contrastive loss that improves speaker detection by focusing on speaking segments, compatible with existing models and training without additional data.

Findings

01

Achieves state-of-the-art performance on AVA-ActiveSpeaker dataset.

02

Effectively integrates with existing ASD frameworks.

03

Improves representation learning for speaker detection.

Abstract

The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking. This encourages the model to learn effective representations through the natural correspondence of speech and facial movements. Our loss can be jointly optimized with the existing objectives for training ASD models without the need for additional supervision or training data. The experiments demonstrate that our loss can be easily integrated into the existing ASD frameworks, improving their performance. Our method achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaistmm/TalkNCE
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies