Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing

Jin Li; Man-Wai Mak; Johan Rohdin; Kong Aik Lee; Hynek Hermansky

arXiv:2506.07536·eess.AS·June 10, 2025·Interspeech

Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing

Jin Li, Man-Wai Mak, Johan Rohdin, Kong Aik Lee, Hynek Hermansky

PDF

Open Access

TL;DR

This paper introduces Bayesian weighted RFN (BWRFN), a novel method using variational inference to adapt frequency weights in speaker verification and anti-spoofing, improving robustness under domain mismatch conditions.

Contribution

It proposes a Bayesian approach to frequency normalization, modeling weight uncertainty to enhance domain invariance in speaker verification and anti-spoofing systems.

Findings

01

BWRFN outperforms RFN and WRFN in cross-dataset ASV tasks.

02

BWRFN improves anti-spoofing robustness against domain shifts.

03

Bayesian modeling effectively captures weight uncertainty, enhancing performance.

Abstract

The performance of automatic speaker verification (ASV) and anti-spoofing drops seriously under real-world domain mismatch conditions. The relaxed instance frequency-wise normalization (RFN), which normalizes the frequency components based on the feature statistics along the time and channel axes, is a promising approach to reducing the domain dependence in the feature maps of a speaker embedding network. We advocate that the different frequencies should receive different weights and that the weights' uncertainty due to domain shift should be accounted for. To these ends, we propose leveraging variational inference to model the posterior distribution of the weights, which results in Bayesian weighted RFN (BWRFN). This approach overcomes the limitations of fixed-weight RFN, making it more effective under domain mismatch conditions. Extensive experiments on cross-dataset ASV, cross-TTS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques

MethodsVariational Inference