Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention
Yuewei Zhang, Huanbin Zou, Jie Zhu

TL;DR
This paper introduces an overlapped-frame information fusion scheme and a causal self-attention mechanism to enhance speech enhancement performance within the inherent delay of time-frequency domain methods.
Contribution
It proposes a novel overlapped-frame fusion approach and a causal TFCA block to better utilize future speech information and improve neural network representation in causal speech enhancement.
Findings
Outperforms current advanced speech enhancement methods
Demonstrates improved speech quality and noise suppression
Validates effectiveness through extensive experiments
Abstract
For time-frequency (TF) domain speech enhancement (SE) methods, the overlap-and-add operation in the inverse TF transformation inevitably leads to an algorithmic delay equal to the window size. However, typical causal SE systems fail to utilize the future speech information within this inherent delay, thereby limiting SE performance. In this paper, we propose an overlapped-frame information fusion scheme. At each frame index, we construct several pseudo overlapped-frames, fuse them with the original speech frame, and then send the fused results to the SE model. Additionally, we introduce a causal time-frequency-channel attention (TFCA) block to boost the representation capability of the neural network. This block parallelly processes the intermediate feature maps through self-attention-based operations in the time, frequency, and channel dimensions. Experiments demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need
