Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi, Shibuya, Shusuke Takahashi, Yuki Mitsufuji

TL;DR
Diffiner is a diffusion-based generative refiner that enhances speech quality by post-processing outputs of various speech enhancement methods, improving perceptual quality without needing retraining for each SE technique.
Contribution
We introduce Diffiner, a versatile diffusion-based speech refiner trained on clean speech that can be applied across different SE methods without additional training.
Findings
Improves perceptual speech quality across various SE methods
Operates as a modular post-processing module
Enhances speech quality without retraining for each SE method
Abstract
Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to improve perceptual speech quality pre-processed by an SE method. We train a diffusion-based generative model by utilizing a dataset consisting of clean speech only. Then, our refiner effectively mixes clean parts newly generated via denoising diffusion restoration into the degraded and distorted parts caused by a preceding SE method, resulting in refined speech. Once our refiner is trained on a set of clean speech, it can be applied to various SE methods without additional training specialized for each SE module. Therefore, our refiner can be a versatile post-processing module w.r.t. SE methods and has high potential in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders
MethodsDiffusion
