SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Zhibin Qiu; Mengfan Fu; Yinfeng Yu; LiLi Yin; Fuchun Sun; Hao Huang

arXiv:2210.16805·cs.SD·November 1, 2022

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Zhibin Qiu, Mengfan Fu, Yinfeng Yu, LiLi Yin, Fuchun Sun, Hao Huang

PDF

Open Access

TL;DR

SRTNet introduces a novel time-domain speech enhancement method using stochastic refinement with a joint deterministic and stochastic network, demonstrating faster training and sampling with improved quality over existing approaches.

Contribution

The paper presents SRTNet, a new stochastic refinement framework for speech enhancement in the time domain, combining deterministic and stochastic modules for improved performance.

Findings

01

Faster training and sampling compared to traditional diffusion models

02

Higher quality speech enhancement results

03

Feasibility demonstrated both theoretically and experimentally

Abstract

Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the ``enhance-and-refine'' paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality. Our code and enhanced samples are available at https://github.com/zhibinQiu/SRTNet.git.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing