Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker   Verification Using Score-Based Diffusion Probabilistic Models

Ju-ho Kim; Jungwoo Heo; Hyun-seo Shin; Chan-yeong Lim; Ha-Jin Yu

arXiv:2309.08320·eess.AS·December 20, 2023

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

PDF

Open Access 1 Repo

TL;DR

Diff-SV introduces a hierarchical diffusion probabilistic model framework that significantly improves noise robustness in speaker verification, outperforming existing systems across various noisy conditions.

Contribution

This paper presents a novel hierarchical diffusion probabilistic model for noise-robust speaker verification, integrating speech enhancement and speaker embedding extraction in a unified framework.

Findings

01

Achieves state-of-the-art performance in noisy conditions

02

Outperforms recent noise-robust SV systems

03

Effective across in-domain and out-of-domain noise scenarios

Abstract

Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wngh1187/diff-sv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsDiffusion