Interactive Feature Fusion for End-to-End Noise-Robust Speech   Recognition

Yuchen Hu; Nana Hou; Chen Chen; Eng Siong Chng

arXiv:2110.05267·eess.AS·April 11, 2022·5 cites

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

PDF

Open Access 2 Repos

TL;DR

This paper introduces IFF-Net, an interactive feature fusion network that combines enhanced and original noisy speech features to improve noise-robust speech recognition, reducing word error rates effectively.

Contribution

The paper presents a novel IFF-Net architecture that effectively fuses features to mitigate over-suppression issues in speech enhancement for ASR.

Findings

01

Achieves 4.1% absolute WER reduction over baseline

02

Effectively complements missing information in over-suppressed features

03

Improves robustness of speech recognition in noisy environments

Abstract

Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility. However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature. Experimental results show that the proposed method achieves absolute word error rate (WER) reduction of 4.1% over the best baseline on RATS Channel-A corpus. Our further analysis indicates that the proposed IFF-Net can complement some missing information in the over-suppressed enhanced feature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques

MethodsRacho art talk sea