Weight Factorization and Centralization for Continual Learning in Speech Recognition
Enes Yavuz Ugan, Ngoc-Quan Pham, Alexander Waibel

TL;DR
This paper introduces a novel continual learning method for speech recognition that mimics human sleep-wake cycles, using weight factorization and centralization to prevent catastrophic forgetting in multilingual, rehearsal-free settings.
Contribution
The paper proposes a new continual learning approach with factorization and centralization phases, inspired by human brain processes, to improve speech recognition models without access to original training data.
Findings
Centralization effectively prevents catastrophic forgetting.
The approach improves performance on code-switching datasets.
Knowledge is accumulated in low-rank adapters.
Abstract
Modern neural network based speech recognition models are required to continually absorb new data without re-training the whole system, especially in downstream applications using foundation models, having no access to the original training data. Continually training the models in a rehearsal-free, multilingual, and language agnostic condition, likely leads to catastrophic forgetting, when a seemingly insignificant disruption to the weights can destructively harm the quality of the models. Inspired by the ability of human brains to learn and consolidate knowledge through the waking-sleeping cycle, we propose a continual learning approach with two distinct phases: factorization and centralization, learning and merging knowledge accordingly. Our experiments on a sequence of varied code-switching datasets showed that the centralization stage can effectively prevent catastrophic forgetting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Speech and Audio Processing
