Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation
Vinay Kothapally, John H.L. Hansen

TL;DR
This paper introduces a complex-valued time-frequency self-attention module that explicitly models spectral and temporal dependencies in speech signals, enhancing dereverberation performance in deep complex neural networks.
Contribution
It proposes a novel complex-valued T-F attention module that captures inter-dependencies between real and imaginary features in speech processing.
Findings
Improves speech quality in dereverberation tasks.
Enhances automatic speech recognition accuracy.
Outperforms previous self-attention methods.
Abstract
Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
