Cross-attention conformer for context modeling in speech enhancement for   ASR

Arun Narayanan; Chung-Cheng Chiu; Tom O'Malley; Quan Wang; Yanzhang He

arXiv:2111.00127·eess.AS·November 2, 2021·1 cites

Cross-attention conformer for context modeling in speech enhancement for ASR

Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He

PDF

Open Access

TL;DR

This paper proposes a cross-attention conformer architecture that leverages contextual information, such as preceding noise segments, to enhance speech features and improve robustness in automatic speech recognition systems.

Contribution

It introduces a novel cross-attention conformer model that effectively integrates sequential context information for speech enhancement.

Findings

01

Improved noise robustness in ASR using cross-attention conformer.

02

Effective merging of contextual noise information with input features.

03

Potential for deep contextual modeling in speech enhancement.

Abstract

This work introduces \emph{cross-attention conformer}, an attention-based architecture for context modeling in speech enhancement. Given that the context information can often be sequential, and of different length as the audio that is to be enhanced, we make use of cross-attention to summarize and merge contextual information with input features. Building upon the recently proposed conformer model that uses self attention layers as building blocks, the proposed cross-attention conformer can be used to build deep contextual models. As a concrete example, we show how noise context, i.e., short noise-only audio segment preceding an utterance, can be used to build a speech enhancement feature frontend using cross-attention conformer layers for improving noise robustness of automatic speech recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing