Parameter-efficient transfer learning of pre-trained Transformer models   for speaker verification using adapters

Junyi Peng; Themos Stafylakis; Rongzhi Gu; Old\v{r}ich Plchot,; Ladislav Mo\v{s}ner; Luk\'a\v{s} Burget; Jan \v{C}ernock\'y

arXiv:2210.16032·eess.AS·October 31, 2022

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Junyi Peng, Themos Stafylakis, Rongzhi Gu, Old\v{r}ich Plchot,, Ladislav Mo\v{s}ner, Luk\'a\v{s} Burget, Jan \v{C}ernock\'y

PDF

Open Access

TL;DR

This paper explores parameter-efficient transfer learning for speaker verification using adapters in pre-trained Transformer models, significantly reducing training parameters while maintaining performance, especially in low-resource, cross-language scenarios.

Contribution

It introduces a PETL approach with adapters for speaker verification, enabling effective fine-tuning by updating less than 4% of parameters, outperforming traditional full fine-tuning in efficiency.

Findings

01

Achieves comparable performance with less than 4% parameter updates.

02

Effective in cross-language low-resource scenarios.

03

Reduces overfitting on small datasets.

Abstract

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks. However, most fine-tuning approaches update all the parameters of the pre-trained model, which becomes prohibitive as the model size grows and sometimes results in overfitting on small datasets. In this paper, we conduct a comprehensive analysis of applying parameter-efficient transfer learning (PETL) methods to reduce the required learnable parameters for adapting to speaker verification tasks. Specifically, during the fine-tuning process, the pre-trained models are frozen, and only lightweight modules inserted in each Transformer block are trainable (a method known as adapters). Moreover, to boost the performance in a cross-language low-resource scenario, the Transformer model is further tuned on a large intermediate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization