RaD-Net 2: A causal two-stage repairing and denoising speech enhancement   network with knowledge distillation and complex axial self-attention

Mingshuai Liu; Zhuangqi Chen; Xiaopeng Yan; Yuanjun Lv; Xianjun Xia,; Chuanzeng Huang; Yijian Xiao; Lei Xie

arXiv:2406.07498·cs.SD·June 12, 2024

RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia,, Chuanzeng Huang, Yijian Xiao, Lei Xie

PDF

Open Access

TL;DR

RaD-Net 2 enhances real-time speech enhancement by integrating causal knowledge distillation and complex axial self-attention, leading to improved speech quality in challenging conditions.

Contribution

The paper introduces RaD-Net 2, which incorporates causal knowledge distillation and complex axial self-attention to overcome previous limitations and improve speech enhancement performance.

Findings

01

0.10 OVRL DNSMOS improvement over RaD-Net

02

Effective use of future information causally

03

Enhanced denoising with complex axial self-attention

Abstract

In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's performance. To mitigate these problems, we extend RaD-Net to its upgraded version, RaD-Net 2. Specifically, a causality-based knowledge distillation is introduced in the first stage to use future information in a causal way. We use the non-causal repairing network as the teacher to improve the performance of the causal repairing network. In addition, in the second stage, complex axial self-attention is applied in the denoising network's complex feature encoder/decoder. Experimental results on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis