Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust   Speech Recognition

Yuchen Hu; Chen Chen; Ruizhe Li; Qiushi Zhu; Eng Siong Chng

arXiv:2302.11362·eess.AS·May 4, 2023

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gradient remedy method to address gradient interference in multi-task learning for noise-robust speech recognition, leading to significant WER reductions on benchmark datasets.

Contribution

It proposes a novel gradient remedy approach that projects and rescales task gradients to improve multi-task learning in noise-robust speech recognition.

Findings

01

Achieves 9.3% and 11.1% relative WER reduction on RATS and CHiME-4 datasets.

02

Effectively resolves gradient interference between speech enhancement and recognition tasks.

03

Enhances multi-task learning performance in noisy speech recognition scenarios.

Abstract

Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuchen005/gradient-remedy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing