A Training and Inference Strategy Using Noisy and Enhanced Speech as   Target for Speech Enhancement without Clean Speech

Li-Wei Chen; Yao-Fei Cheng; Hung-Shin Lee; Yu Tsao; Hsin-Min Wang

arXiv:2210.15368·cs.SD·May 23, 2023

A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech

Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel training and inference strategy for speech enhancement that leverages noisy and enhanced speech as targets, addressing the challenge of lacking clean speech data and improving model performance.

Contribution

It proposes a new approach that uses enhanced speech as a target in training, along with remixing techniques, to better match training conditions with real-world noisy environments.

Findings

01

Outperforms baseline methods in speech enhancement tasks.

02

Effective in scenarios with mismatched training and evaluation conditions.

03

Teacher/student inference further improves speech quality.

Abstract

The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that there is an inevitable mismatch between their training criterion and evaluation metric. In response to this unfavorable situation, we propose a training and inference strategy that additionally uses enhanced speech as a target by improving the previously proposed noisy-target training (NyTT). Because homogeneity between in-domain noise and extraneous noise is the key to the effectiveness of NyTT, we train various student models by remixing 1) the teacher model's estimated speech and noise for enhanced-target training or 2) raw noisy speech and the teacher model's estimated noise for noisy-target training. Experimental results show that our proposed method outperforms several baselines, especially with the teacher/student inference, where predicted clean speech is derived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis