LASER: Lip Landmark Assisted Speaker Detection for Robustness
Le Thien Phuc Nguyen, Zhuoran Yu, Yong Jae Lee

TL;DR
LASER enhances active speaker detection by explicitly using lip landmarks during training, improving robustness against low resolution, occlusion, and background noise without requiring landmarks at test time.
Contribution
Introduces LASER, a novel method that incorporates lip landmarks into training for more robust speaker detection, and creates LASER-bench to evaluate performance under noisy conditions.
Findings
LASER outperforms state-of-the-art models on multiple benchmarks.
LASER improves detection accuracy in high-noise environments.
The auxiliary loss enhances robustness without increasing test-time complexity.
Abstract
Active Speaker Detection (ASD) aims to identify who is speaking in complex visual scenes. While humans naturally rely on lip-audio synchronization, existing ASD models often misclassify non-speaking instances when lip movements and audio are unsynchronized. To address this, we propose Lip landmark Assisted Speaker dEtection for Robustness (LASER), which explicitly incorporates lip landmarks during training to guide the model's attention to speech-relevant regions. Given a face track, LASER extracts visual features and encodes 2D lip landmarks into dense maps. To handle failure cases such as low resolution or occlusion, we introduce an auxiliary consistency loss that aligns lip-aware and face-only predictions, removing the need for landmark detectors at test time. LASER outperforms state-of-the-art models across both in-domain and out-of-domain benchmarks. To further evaluate robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis
MethodsALIGN
