Efficient Adapter Transfer of Self-Supervised Speech Models for   Automatic Speech Recognition

Bethan Thomas; Samuel Kessler; Salah Karout

arXiv:2202.03218·cs.CL·February 8, 2022

Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition

Bethan Thomas, Samuel Kessler, Salah Karout

PDF

Open Access 1 Repo

TL;DR

This paper introduces adapter modules for self-supervised speech models like wav2vec 2.0, enabling efficient transfer to automatic speech recognition tasks by significantly reducing training parameters while maintaining performance.

Contribution

It demonstrates that applying adapters to wav2vec 2.0 reduces parameter requirements for ASR, enhancing scalability across tasks and languages with minimal performance loss.

Findings

01

Adapters enable ASR with less than 10% of parameters compared to full fine-tuning.

02

Applying adapters to top layers yields similar performance to full transfer.

03

Using adapters improves scalability for multi-task and multilingual speech recognition.

Abstract

Self-supervised learning (SSL) is a powerful tool that allows learning of underlying representations from unlabeled data. Transformer based models such as wav2vec 2.0 and HuBERT are leading the field in the speech domain. Generally these models are fine-tuned on a small amount of labeled data for a downstream task such as Automatic Speech Recognition (ASR). This involves re-training the majority of the model for each task. Adapters are small lightweight modules which are commonly used in Natural Language Processing (NLP) to adapt pre-trained models to new tasks. In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages. Using adapters we can perform ASR while training fewer than 10% of parameters per task compared to full fine-tuning with little…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sinhat98/adapter-wavlm
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Absolute Position Encodings · Softmax