Continual Speech Learning with Fused Speech Features

Guitao Wang; Jinming Zhao; Hao Yang; Guilin Qi; Tongtong Wu; Gholamreza Haffari

arXiv:2506.01496·cs.CL·June 4, 2025

Continual Speech Learning with Fused Speech Features

Guitao Wang, Jinming Zhao, Hao Yang, Guilin Qi, Tongtong Wu, Gholamreza Haffari

PDF

Open Access

TL;DR

This paper proposes a continual learning framework for speech models using a gated-fusion layer on Whisper, enabling dynamic task-specific feature selection and significantly improving adaptation across multiple speech tasks.

Contribution

It introduces a novel continual speech learning setup with a gated-fusion layer on Whisper, enhancing task adaptation without full retraining.

Findings

01

Significant accuracy improvements over traditional methods

02

Effective adaptation to new speech tasks

03

Demonstrated across six speech processing tasks

Abstract

Rapid growth in speech data demands adaptive models, as traditional static methods fail to keep pace with dynamic and diverse speech information. We introduce continuous speech learning, a new set-up targeting at bridging the adaptation gap in current speech models. We use the encoder-decoder Whisper model to standardize speech tasks into a generative format. We integrate a learnable gated-fusion layer on the top of the encoder to dynamically select task-specific features for downstream tasks. Our approach improves accuracy significantly over traditional methods in six speech processing tasks, demonstrating gains in adapting to new speech tasks without full retraining.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing