Deep Representation Decomposition for Rate-Invariant Speaker   Verification

Fuchuan Tong; Siqi Zheng; Haodong Zhou; Xingjia Xie; Qingyang Hong,; Lin Li

arXiv:2205.14294·eess.AS·May 31, 2022·Odyssey

Deep Representation Decomposition for Rate-Invariant Speaker Verification

Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong,, Lin Li

PDF

Open Access

TL;DR

This paper introduces a deep learning method that decomposes speaker embeddings into rate-invariant features using adversarial training, improving speaker verification performance across speaking styles.

Contribution

It proposes a novel deep representation decomposition with adversarial learning to achieve speaking rate-invariant speaker embeddings, addressing variability issues in speaker verification.

Findings

01

Improved verification accuracy on VoxCeleb1 and HI-MIA datasets.

02

Effective reduction of speaking rate influence on speaker embeddings.

03

Demonstrated robustness of identity features against speaking rate variations.

Abstract

While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker verification systems, which may actually degrade the system performance. To reduce intra-class discrepancy caused by speaking rate, we propose a deep representation decomposition approach with adversarial learning to learn speaking rate-invariant speaker embeddings. Specifically, adopting an attention block, we decompose the original embedding into an identity-related component and a rate-related component through multi-task training. Additionally, to reduce the latent relationship between the two decomposed components, we further propose a cosine mapping block to train the parameters adversarially to minimize the cosine similarity between the two decomposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing