Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

TL;DR
This paper introduces a gradient remedy method to address gradient interference in multi-task learning for noise-robust speech recognition, leading to significant WER reductions on benchmark datasets.
Contribution
It proposes a novel gradient remedy approach that projects and rescales task gradients to improve multi-task learning in noise-robust speech recognition.
Findings
Achieves 9.3% and 11.1% relative WER reduction on RATS and CHiME-4 datasets.
Effectively resolves gradient interference between speech enhancement and recognition tasks.
Enhances multi-task learning performance in noisy speech recognition scenarios.
Abstract
Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
