Universal Score-based Speech Enhancement with High Content Preservation

Robin Scheibler; Yusuke Fujita; Yuma Shirahata; Tatsuya Komatsu

arXiv:2406.12194·eess.AS·June 19, 2024·1 cites

Universal Score-based Speech Enhancement with High Content Preservation

Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu

PDF

Open Access 1 Repo 1 Models

TL;DR

UNIVERSE++ is a universal speech enhancement method that combines score-based diffusion, adversarial training, and low-rank adaptation to improve speech quality and content preservation across diverse noisy conditions.

Contribution

The paper introduces architectural improvements, adversarial loss, and low-rank adaptation with phoneme fidelity to enhance a universal speech enhancement model.

Findings

01

Outperforms existing baselines on multiple benchmark datasets.

02

Achieves high content preservation and speech intelligibility.

03

Demonstrates robustness across various noise and distortion types.

Abstract

We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

line/open-universe
pytorchOfficial

Models

🤗
line-corporation/open-universe
model· 390 dl· ♡ 3
390 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Advanced Adaptive Filtering Techniques

MethodsDiffusion