uSee: Unified Speech Enhancement and Editing with Conditional Diffusion   Models

Muqiao Yang; Chunlei Zhang; Yong Xu; Zhongweiyang Xu; Heming Wang,; Bhiksha Raj; Dong Yu

arXiv:2310.00900·cs.SD·October 3, 2023

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang,, Bhiksha Raj, Dong Yu

PDF

Open Access

TL;DR

uSee is a unified generative model using conditional diffusion techniques to enhance and edit speech signals, allowing controllable modifications based on various conditions and outperforming existing methods in quality and flexibility.

Contribution

This paper introduces uSee, a novel unified diffusion-based model for simultaneous speech enhancement and editing, enabling controllable, multi-condition speech processing in a single framework.

Findings

01

Superior speech denoising and dereverberation performance

02

Effective speech editing with environmental and noise controls

03

Generates high-quality, controllable speech outputs

Abstract

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts to the score-based diffusion model, we can enable controllable generation of the unified speech enhancement and editing model to perform corresponding actions on the source speech. Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models, and can perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

MethodsDiffusion