Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

Xingwei Sun; Heinrich Dinkel; Yadong Niu; Linzhang Wang; Junbo Zhang; Jian Luan

arXiv:2506.11514·eess.AS·June 16, 2025

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper presents a novel speech enhancement method that uses pre-trained generative audio encoders to produce higher quality speech from noisy inputs, outperforming existing models in both objective and subjective evaluations.

Contribution

The paper introduces a new speech enhancement approach leveraging pre-trained generative audio encoders and a vocoder, demonstrating improved performance and efficiency over discriminative models.

Findings

01

Outperforms discriminative audioencoder-based models in speech enhancement.

02

Achieves higher perceptual quality in subjective listening tests.

03

Uses fewer parameters with an efficient denoising encoder.

Abstract

Recent research has delved into speech enhancement (SE) approaches that leverage audio embeddings from pre-trained models, diverging from time-frequency masking or signal prediction techniques. This paper introduces an efficient and extensible SE method. Our approach involves initially extracting audio embeddings from noisy speech using a pre-trained audioencoder, which are then denoised by a compact encoder network. Subsequently, a vocoder synthesizes the clean speech from denoised embeddings. An ablation study substantiates the parameter efficiency of the denoise encoder with a pre-trained audioencoder and vocoder. Experimental results on both speech enhancement and speaker fidelity demonstrate that our generative audioencoder-based SE system outperforms models utilizing discriminative audioencoders. Furthermore, subjective listening tests validate that our proposed system surpasses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaomi-research/dasheng-denoiser
pytorchOfficial

Models

🤗
mispeech/dasheng-denoiser
model· 212 dl· ♡ 3
212 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing