Continual Learning Optimizations for Auto-regressive Decoder of   Multilingual ASR systems

Chin Yuen Kwok; Jia Qi Yip; Eng Siong Chng

arXiv:2407.03645·cs.CL·September 30, 2024

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems

Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng

PDF

Open Access

TL;DR

This paper introduces four optimization techniques for continual learning in multilingual ASR auto-regressive decoders, significantly improving performance on unseen languages without degrading existing language accuracy.

Contribution

It proposes novel decoder-specific optimizations tailored for continual learning in multilingual ASR, addressing limitations of existing methods.

Findings

01

Reduced average word error rate from 14.2% to 12.4% on unseen languages

02

Maintained performance on pre-trained languages during adaptation

03

Demonstrated effectiveness on Whisper model with 10 new languages

Abstract

Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Fault Detection and Control Systems · Wireless Signal Modulation Classification

MethodsExperience Replay