An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised   Speech Representation Learning

Samuel Kessler; Bethan Thomas; Salah Karout

arXiv:2107.13530·eess.AS·February 8, 2022

An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning

Samuel Kessler, Bethan Thomas, Salah Karout

PDF

Open Access

TL;DR

This paper introduces an adapter-based pre-training method that efficiently transfers self-supervised speech representations across multiple languages, reducing pre-training time and avoiding forgetting of previous language knowledge.

Contribution

The authors propose using adapter modules with wav2vec 2.0 to significantly decrease pre-training time for new languages while preserving existing language representations.

Findings

01

Pre-training time reduced by 32% with adapters.

02

Model retains previous language knowledge without catastrophic forgetting.

03

Effective cross-lingual speech recognition performance.

Abstract

We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on small annotated datasets is a promising direction to build speech recognition systems. SSL models generally perform SSL on raw audio in a pre-training phase and then fine-tune on a small fraction of annotated data. Such models have produced state of the art results for ASR. However, these models are very expensive to pre-train. We use an existing wav2vec 2.0 model and tackle the problem of learning new language representations while utilizing existing model knowledge. Crucially we do so without catastrophic forgetting of the existing language representation. We use adapter modules to speed up pre-training a new language task. Our model can decrease…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsAdapter