Replay to Remember: Continual Layer-Specific Fine-tuning for German   Speech Recognition

Theresa Pekarek Rosin; Stefan Wermter

arXiv:2307.07280·cs.CL·October 19, 2023

Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition

Theresa Pekarek Rosin, Stefan Wermter

PDF

Open Access

TL;DR

This paper explores layer-specific fine-tuning and experience replay techniques to adapt large-scale German speech recognition models to new domains, achieving low error rates while maintaining overall performance.

Contribution

It introduces a continual learning approach with selective freezing and experience replay to improve domain adaptation in German ASR models.

Findings

01

WER below 5% on new domain with limited data

02

Selective freezing preserves general speech recognition performance

03

Experience replay stabilizes performance across domains

Abstract

While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling

MethodsExperience Replay