Personalized Speech Enhancement Without a Separate Speaker Embedding   Model

Tanel P\"arnamaa; Ando Saabas

arXiv:2406.09928·cs.SD·June 17, 2024

Personalized Speech Enhancement Without a Separate Speaker Embedding Model

Tanel P\"arnamaa, Ando Saabas

PDF

Open Access

TL;DR

This paper introduces a novel personalized speech enhancement method that uses the model’s internal representations as speaker embeddings, eliminating the need for separate embedding models and achieving superior performance.

Contribution

The proposed approach simplifies PSE systems by removing the need for external speaker embedding models, maintaining or improving performance on noise suppression and echo cancellation tasks.

Findings

01

Performs as well or better than traditional methods using pre-trained embeddings.

02

Outperforms the ICASSP 2023 Deep Noise Suppression Challenge winner by 0.15 in MOS.

03

Reduces system complexity by internalizing speaker representation extraction.

Abstract

Personalized speech enhancement (PSE) models can improve the audio quality of teleconferencing systems by adapting to the characteristics of a speaker's voice. However, most existing methods require a separate speaker embedding model to extract a vector representation of the speaker from enrollment audio, which adds complexity to the training and deployment process. We propose to use the internal representation of the PSE model itself as the speaker embedding, thereby avoiding the need for a separate model. We show that our approach performs equally well or better than the standard method of using a pre-trained speaker embedding model on noise suppression and echo cancellation tasks. Moreover, our approach surpasses the ICASSP 2023 Deep Noise Suppression Challenge winner by 0.15 in Mean Opinion Score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques