Unsupervised Speech Enhancement with speech recognition embedding and   disentanglement losses

Viet Anh Trinh (1); Sebastian Braun (2) ((1) CUNY Graduate Center; (2); Microsoft Research)

arXiv:2111.08678·eess.AS·February 22, 2022

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Viet Anh Trinh (1), Sebastian Braun (2) ((1) CUNY Graduate Center, (2), Microsoft Research)

PDF

TL;DR

This paper introduces an unsupervised speech enhancement method that combines speech recognition embeddings and disentanglement losses, addressing domain mismatch and performance trade-offs in supervised systems.

Contribution

It proposes a novel unsupervised loss function extending MixIT with recognition embeddings and disentanglement, improving speech enhancement and ASR performance.

Findings

01

Improves speech enhancement over supervised baseline on VoxCeleb dataset.

02

Joint supervised and unsupervised training achieves comparable speech quality and better ASR.

03

Fully unsupervised training alone does not surpass supervised baseline.

Abstract

Speech enhancement has recently achieved great success with various deep learning methods. However, most conventional speech enhancement systems are trained with supervised methods that impose two significant challenges. First, a majority of training datasets for speech enhancement systems are synthetic. When mixing clean speech and noisy corpora to create the synthetic datasets, domain mismatches occur between synthetic and real-world recordings of noisy speech or audio. Second, there is a trade-off between increasing speech enhancement performance and degrading speech recognition (ASR) performance. Thus, we propose an unsupervised loss function to tackle those two problems. Our function is developed by extending the MixIT loss function with speech recognition embedding and disentanglement loss. Our results show that the proposed function effectively improves the speech enhancement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.