UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at   ActivityNet Challenge 2022

Yuanhang Zhang; Susan Liang; Shuang Yang; Shiguang Shan

arXiv:2206.10861·cs.CV·June 23, 2022

UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan

PDF

Open Access

TL;DR

This paper introduces UniCon+, a state-of-the-art active speaker detection model that builds on previous architectures with a GRU-based module, achieving top performance with 94.47% mAP at ActivityNet Challenge 2022.

Contribution

UniCon+ extends the Unified Context Network with a GRU module for better identity flow, setting new state-of-the-art results in active speaker detection.

Findings

01

Achieved 94.47% mAP on AVA-ActiveSpeaker test set.

02

Ranked first on the ActivityNet Challenge 2022 leaderboard.

03

Significantly improved over previous models.

Abstract

This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes through read and update operations. We report a best result of 94.47% mAP on the AVA-ActiveSpeaker test set, which continues to rank first on this year's challenge leaderboard and significantly pushes the state-of-the-art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems

MethodsTest