Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
Zhaokai Sun, Li Zhang, Qing Wang, Pan Zhou, Lei Xie

TL;DR
This paper introduces a speaker-aware progressive approach for overlapping speech detection that leverages SSL models and a speaker attention module, achieving state-of-the-art results in multi-party speech scenarios.
Contribution
It proposes a novel progressive training strategy combined with speaker-aware features and SSL models to improve overlapping speech detection accuracy.
Findings
Achieved an F1 score of 82.76% on the AMI test set.
Enhanced robustness and effectiveness in multi-party speech detection.
Demonstrated superiority over existing methods.
Abstract
Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks such as voice activity detection (VAD) and overlap detection. To improve acoustic representation, we explore the effectiveness of state-of-the-art self-supervised learning (SSL) models, including WavLM and wav2vec 2.0, while incorporating a speaker attention module to enrich features with frame-level speaker information. Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76\% on the AMI test set, demonstrating its robustness and effectiveness in OSD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Emotion and Mood Recognition
MethodsSoftmax · Attention Is All You Need
