Rethinking Session Variability: Leveraging Session Embeddings for   Session Robustness in Speaker Verification

Hee-Soo Heo; KiHyun Nam; Bong-Jin Lee; Youngki Kwon; Minjae Lee; You; Jin Kim; Joon Son Chung

arXiv:2309.14741·eess.AS·September 27, 2023

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

Hee-Soo Heo, KiHyun Nam, Bong-Jin Lee, Youngki Kwon, Minjae Lee, You, Jin Kim, Joon Son Chung

PDF

Open Access

TL;DR

This paper proposes a novel method for speaker verification that uses session embeddings to compensate for session variability, improving robustness without retraining the core speaker embedding model.

Contribution

Introduces an auxiliary session embedding network that enhances speaker verification by effectively modeling and compensating session variability.

Findings

01

Session embeddings improve robustness against session variability.

02

The method achieves better verification accuracy without retraining the main model.

03

Effective session compensation demonstrated through extensive experiments.

Abstract

In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing