Breaking the trade-off in personalized speech enhancement with   cross-task knowledge distillation

Hassan Taherian; Sefik Emre Eskimez; and Takuya Yoshioka

arXiv:2211.02944·eess.AS·November 8, 2022

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Hassan Taherian, Sefik Emre Eskimez, and Takuya Yoshioka

PDF

Open Access

TL;DR

This paper introduces a novel training framework for personalized speech enhancement that uses cross-task knowledge distillation and a pVAD to balance speech suppression and interference leakage, improving model performance.

Contribution

It proposes a new PSE training method leveraging cross-task knowledge distillation and pVAD to mitigate the trade-off between over-suppression and leakage.

Findings

01

Reduces interference leakage in silent target speaker segments

02

Balances speech suppression and interference leakage effectively

03

Improves PSE performance across various scenarios

Abstract

Personalized speech enhancement (PSE) models achieve promising results compared with unconditional speech enhancement models due to their ability to remove interfering speech in addition to background noise. Unlike unconditional speech enhancement, causal PSE models may occasionally remove the target speech by mistake. The PSE models also tend to leak interfering speech when the target speaker is silent for an extended period. We show that existing PSE methods suffer from a trade-off between speech over-suppression and interference leakage by addressing one problem at the expense of the other. We propose a new PSE model training framework using cross-task knowledge distillation to mitigate this trade-off. Specifically, we utilize a personalized voice activity detector (pVAD) during training to exclude the non-target speech frames that are wrongly identified as containing the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

MethodsKnowledge Distillation