SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker   Recognition

Tianhao Wang; Lantian Li; Dong Wang

arXiv:2406.07832·cs.SD·June 13, 2024

SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

Tianhao Wang, Lantian Li, Dong Wang

PDF

Open Access

TL;DR

This paper proposes the SE/BN adapter, a parameter-efficient method for domain adaptation in speaker recognition that maintains high performance by tuning only 1% of parameters, inspired by adapter techniques in self-supervised models.

Contribution

It introduces a novel SE/BN adapter that enables effective domain adaptation by freezing the core encoder and tuning minimal parameters, improving efficiency over traditional fine-tuning.

Findings

01

SE/BN adapter outperforms baseline models in speaker recognition tasks.

02

It achieves comparable results to full fine-tuning by tuning only 1% of parameters.

03

Experiments on VoxCeleb and CN-Celeb datasets validate its effectiveness.

Abstract

Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success of adapters in self-supervised pre-trained models, this paper introduces a SE/BN adapter to address this challenge. By freezing the core speaker encoder and adjusting the feature maps' weights and activation distributions, we introduce a novel adapter utilizing trainable squeeze-and-excitation (SE) blocks and batch normalization (BN) layers, termed SE/BN adapter. Our experiments, conducted using VoxCeleb for pre-training and 4 genres from CN-Celeb for adaptation, demonstrate that the SE/BN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing