Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings

Jason Clarke; Yoshihiko Gotoh; Stefan Goetze

arXiv:2506.18055·cs.MM·June 24, 2025

Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings

Jason Clarke, Yoshihiko Gotoh, Stefan Goetze

PDF

1 Repo

TL;DR

This paper introduces SL-ASD, a novel face-voice association framework for audiovisual active speaker detection in egocentric recordings, outperforming traditional synchronisation-based methods under challenging conditions.

Contribution

The work presents a new system that relies solely on face-voice associations, reducing dependence on audiovisual synchronisation, and demonstrates its effectiveness in egocentric scenarios.

Findings

01

Achieves comparable or better performance than synchronisation-based methods.

02

Uses fewer learnable parameters, increasing efficiency.

03

Validates face-voice association as a viable alternative in challenging conditions.

Abstract

Audiovisual active speaker detection (ASD) is conventionally performed by modelling the temporal synchronisation of acoustic and visual speech cues. In egocentric recordings, however, the efficacy of synchronisation-based methods is compromised by occlusions, motion blur, and adverse acoustic conditions. In this work, a novel framework is proposed that exclusively leverages cross-modal face-voice associations to determine speaker activity. An existing face-voice association model is integrated with a transformer-based encoder that aggregates facial identity information by dynamically weighting each frame based on its visual quality. This system is then coupled with a front-end utterance segmentation method, producing a complete ASD system. This work demonstrates that the proposed system, Self-Lifting for audiovisual active speaker detection(SL-ASD), achieves performance comparable to,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jclarke98/sl_asd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.