Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models

Chi-Yuan Hsiao; Ke-Han Lu; Kai-Wei Chang; Chih-Kai Yang; Wei-Chih Chen; Hung-yi Lee

arXiv:2505.17496·cs.CL·May 26, 2025

Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models

Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee

PDF

9 Models

TL;DR

This paper investigates catastrophic forgetting in end-to-end spoken language models and evaluates mitigation strategies, finding experience replay most effective for balancing knowledge retention and new learning.

Contribution

It introduces and compares three mitigation strategies for catastrophic forgetting in spoken language models, highlighting the effectiveness of experience replay.

Findings

01

Experience replay significantly reduces forgetting.

02

Combining mitigation strategies yields better results.

03

Insights for more robust SLM training pipelines.

Abstract

End-to-end training of Spoken Language Models (SLMs) commonly involves adapting pre-trained text-based Large Language Models (LLMs) to the speech modality through multi-stage training on diverse tasks such as ASR, TTS and spoken question answering (SQA). Although this multi-stage continual learning equips LLMs with both speech understanding and generation capabilities, the substantial differences in task and data distributions across stages can lead to catastrophic forgetting, where previously acquired knowledge is lost. This paper investigates catastrophic forgetting and evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay to balance knowledge retention with new learning. Results show that experience replay is the most effective, with further gains achieved by combining it with other methods. These findings provide insights for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExperience Replay