SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
Tianhao Wang, Lantian Li, Dong Wang

TL;DR
This paper proposes the SE/BN adapter, a parameter-efficient method for domain adaptation in speaker recognition that maintains high performance by tuning only 1% of parameters, inspired by adapter techniques in self-supervised models.
Contribution
It introduces a novel SE/BN adapter that enables effective domain adaptation by freezing the core encoder and tuning minimal parameters, improving efficiency over traditional fine-tuning.
Findings
SE/BN adapter outperforms baseline models in speaker recognition tasks.
It achieves comparable results to full fine-tuning by tuning only 1% of parameters.
Experiments on VoxCeleb and CN-Celeb datasets validate its effectiveness.
Abstract
Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success of adapters in self-supervised pre-trained models, this paper introduces a SE/BN adapter to address this challenge. By freezing the core speaker encoder and adjusting the feature maps' weights and activation distributions, we introduce a novel adapter utilizing trainable squeeze-and-excitation (SE) blocks and batch normalization (BN) layers, termed SE/BN adapter. Our experiments, conducted using VoxCeleb for pre-training and 4 genres from CN-Celeb for adaptation, demonstrate that the SE/BN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
