Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Theresa Pekarek Rosin, Stefan Wermter

TL;DR
This paper explores layer-specific fine-tuning and experience replay techniques to adapt large-scale German speech recognition models to new domains, achieving low error rates while maintaining overall performance.
Contribution
It introduces a continual learning approach with selective freezing and experience replay to improve domain adaptation in German ASR models.
Findings
WER below 5% on new domain with limited data
Selective freezing preserves general speech recognition performance
Experience replay stabilizes performance across domains
Abstract
While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
MethodsExperience Replay
