OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Cheng Yu; Szu-Wei Fu; Tsun-An Hsieh; Yu Tsao; Mirco Ravanelli

arXiv:2111.05703·eess.AS·November 11, 2021

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

PDF

Open Access

TL;DR

OSSEM introduces a meta-learning-based one-shot speaker-adaptive speech enhancement system that quickly adapts to individual speakers using minimal data, achieving real-time performance and competitive results.

Contribution

The paper presents a novel meta-learning approach for speaker adaptation in speech enhancement, enabling effective one-shot adaptation with a causal, real-time system.

Findings

01

Effective speaker adaptation with only one utterance.

02

Competitive performance against state-of-the-art causal SE systems.

03

Real-time, causal speech enhancement achieved.

Abstract

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified transformer SE network and a speaker-specific masking (SSM) network. In practice, the SSM network takes an enrolled speaker embedding extracted using ECAPA-TDNN to adjust the input noisy feature through masking. To evaluate OSSEM, we designed a modified Voice Bank-DEMAND dataset, in which one utterance from the testing set was used for model adaptation, and the remaining utterances were used for testing the performance. Moreover, we set restrictions allowing the enhancement process to be conducted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders