SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with   Local and Global Contexts Aggregation

Ziling Huang; Haixin Guan; Haoran Wei; Yanhua Long

arXiv:2501.11274·eess.AS·January 22, 2025

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

Ziling Huang, Haixin Guan, Haoran Wei, Yanhua Long

PDF

Open Access 1 Repo

TL;DR

SEF-PNet introduces a speaker encoder-free approach for personalized speech enhancement, utilizing local-global context aggregation and interactive speaker adaptation to improve performance without relying on pre-trained speaker models.

Contribution

The paper proposes SEF-PNet, a novel PSE model that eliminates the need for speaker encoders by leveraging enrollment speech and noisy mixtures with innovative context aggregation and adaptation techniques.

Findings

01

Outperforms baseline models on Libri2Mix dataset

02

Achieves state-of-the-art PSE performance

03

Effectively utilizes enrollment speech without speaker encoders

Abstract

Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed speaker encoders to extract target speaker clues, guiding the PSE model in isolating the desired speech. However, these approaches suffer from significant model complexity and often underutilize enrollment speaker information, limiting the potential performance of the PSE model. To address these limitations, we propose a novel Speaker Encoder-Free PSE network, termed SEF-PNet, which fully exploits the information present in both the enrollment speech and noisy mixtures. SEF-PNet incorporates two key innovations: Interactive Speaker Adaptation (ISA) and Local-Global Context Aggregation (LCA). ISA dynamically modulates the interactions between enrollment and noisy signals to enhance the speaker adaptation, while LCA employs advanced channel attention within the PSE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ishuangziling/sef-pnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsSoftmax · Attention Is All You Need