Incorporating Real-world Noisy Speech in Neural-network-based Speech   Enhancement Systems

Yangyang Xia; Buye Xu; Anurag Kumar

arXiv:2109.05172·eess.AS·September 22, 2021·1 cites

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

Yangyang Xia, Buye Xu, Anurag Kumar

PDF

Open Access

TL;DR

This paper introduces a semi-supervised method that enables neural speech enhancement systems to leverage real-world noisy speech data, improving their robustness in practical scenarios.

Contribution

The paper proposes a novel semi-supervised training approach using a vector-quantized variational autoencoder and triplet loss to incorporate real-world noisy speech data.

Findings

01

Promising results in real-world noisy speech scenarios

02

Effective training with real-world data using the proposed method

03

Improved speech enhancement performance over traditional supervised methods

Abstract

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training. This setting prohibits the use of real-world degraded speech data that may better represent the scenarios where such systems are used. In this paper, we explore methods that enable supervised speech enhancement systems to train on real-world degraded speech data. Specifically, we propose a semi-supervised approach for speech enhancement in which we first train a modified vector-quantized variational autoencoder that solves a source separation task. We then use this trained autoencoder to further train an enhancement network using real-world noisy speech data by computing a triplet-based unsupervised loss function. Experiments show promising results for incorporating real-world data in training speech enhancement systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing