Diffiner: A Versatile Diffusion-based Generative Refiner for Speech   Enhancement

Ryosuke Sawata; Naoki Murata; Yuhta Takida; Toshimitsu Uesaka; Takashi; Shibuya; Shusuke Takahashi; Yuki Mitsufuji

arXiv:2210.17287·eess.AS·August 31, 2023·1 cites

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi, Shibuya, Shusuke Takahashi, Yuki Mitsufuji

PDF

Open Access 1 Repo

TL;DR

Diffiner is a diffusion-based generative refiner that enhances speech quality by post-processing outputs of various speech enhancement methods, improving perceptual quality without needing retraining for each SE technique.

Contribution

We introduce Diffiner, a versatile diffusion-based speech refiner trained on clean speech that can be applied across different SE methods without additional training.

Findings

01

Improves perceptual speech quality across various SE methods

02

Operates as a modular post-processing module

03

Enhances speech quality without retraining for each SE method

Abstract

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to improve perceptual speech quality pre-processed by an SE method. We train a diffusion-based generative model by utilizing a dataset consisting of clean speech only. Then, our refiner effectively mixes clean parts newly generated via denoising diffusion restoration into the degraded and distorted parts caused by a preceding SE method, resulting in refined speech. Once our refiner is trained on a set of clean speech, it can be applied to various SE methods without additional training specialized for each SE module. Therefore, our refiner can be a versatile post-processing module w.r.t. SE methods and has high potential in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sony/diffiner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders

MethodsDiffusion