Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

Vinay Kothapally; John H.L. Hansen

arXiv:2211.12632·eess.AS·November 24, 2022

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

Vinay Kothapally, John H.L. Hansen

PDF

TL;DR

This paper introduces a complex-valued time-frequency self-attention module that explicitly models spectral and temporal dependencies in speech signals, enhancing dereverberation performance in deep complex neural networks.

Contribution

It proposes a novel complex-valued T-F attention module that captures inter-dependencies between real and imaginary features in speech processing.

Findings

01

Improves speech quality in dereverberation tasks.

02

Enhances automatic speech recognition accuracy.

03

Outperforms previous self-attention methods.

Abstract

Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.