Speaker detection in the wild: Lessons learned from JSALT 2019
Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan,, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar, Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei, Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang

TL;DR
This paper investigates speaker detection in challenging real-world scenarios, emphasizing the importance of diarization as a crucial step for improving detection accuracy across diverse conditions.
Contribution
It demonstrates that incorporating diarization as a preliminary stage significantly enhances speaker detection performance in adverse environments.
Findings
Diarization improves speaker detection accuracy.
Effective handling of noisy signals is crucial.
Domain mismatch impacts detection performance.
Abstract
This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker detection; but our first finding was that an effective diarization improves detection, and not having a diarization stage impoverishes the performance. All the different configurations of our research agree on this fact and follow a main backbone that includes diarization as a previous stage. With this backbone, we analyzed the following problems: voice activity detection, how to deal with noisy signals, domain mismatch, how to improve the clustering; and the overall impact of previous stages in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
