A Framework for Robust Speaker Verification in Highly Noisy Environments Leveraging Both Noisy and Enhanced Audio
Adam Katav, Yair Moshe, Israel Cohen

TL;DR
This paper introduces a neural network framework that combines noisy and enhanced speech features to improve speaker verification accuracy in highly noisy environments, leveraging a Siamese architecture for robustness.
Contribution
The proposed framework uniquely integrates features from both noisy and enhanced speech using a Siamese network, enhancing robustness without relying on specific speech enhancement methods.
Findings
Outperforms existing speaker verification methods in noisy conditions
Effective in leveraging both noisy and enhanced speech features
Lightweight and adaptable to various speech enhancement techniques
Abstract
Recent advancements in speaker verification techniques show promise, but their performance often deteriorates significantly in challenging acoustic environments. Although speech enhancement methods can improve perceived audio quality, they may unintentionally distort speaker-specific information, which can affect verification accuracy. This problem has become more noticeable with the increasing use of generative deep neural networks (DNNs) for speech enhancement. While these networks can produce intelligible speech even in conditions of very low signal-to-noise ratio (SNR), they may also severely alter distinctive speaker characteristics. To tackle this issue, we propose a novel neural network framework that effectively combines speaker embeddings extracted from both noisy and enhanced speech using a Siamese architecture. This architecture allows us to leverage complementary information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
