Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Junyi Peng, Themos Stafylakis, Rongzhi Gu, Old\v{r}ich Plchot,, Ladislav Mo\v{s}ner, Luk\'a\v{s} Burget, Jan \v{C}ernock\'y

TL;DR
This paper explores parameter-efficient transfer learning for speaker verification using adapters in pre-trained Transformer models, significantly reducing training parameters while maintaining performance, especially in low-resource, cross-language scenarios.
Contribution
It introduces a PETL approach with adapters for speaker verification, enabling effective fine-tuning by updating less than 4% of parameters, outperforming traditional full fine-tuning in efficiency.
Findings
Achieves comparable performance with less than 4% parameter updates.
Effective in cross-language low-resource scenarios.
Reduces overfitting on small datasets.
Abstract
Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks. However, most fine-tuning approaches update all the parameters of the pre-trained model, which becomes prohibitive as the model size grows and sometimes results in overfitting on small datasets. In this paper, we conduct a comprehensive analysis of applying parameter-efficient transfer learning (PETL) methods to reduce the required learnable parameters for adapting to speaker verification tasks. Specifically, during the fine-tuning process, the pre-trained models are frozen, and only lightweight modules inserted in each Transformer block are trainable (a method known as adapters). Moreover, to boost the performance in a cross-language low-resource scenario, the Transformer model is further tuned on a large intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization
