Phoneme-based Distribution Regularization for Speech Enhancement

Yajing Liu; Xiulian Peng; Zhiwei Xiong; Yan Lu

arXiv:2104.03759·eess.AS·April 9, 2021·1 cites

Phoneme-based Distribution Regularization for Speech Enhancement

Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu

PDF

Open Access

TL;DR

This paper introduces a phoneme-based distribution regularization method that leverages phoneme information to improve speech enhancement quality and ASR accuracy by modulating feature distributions.

Contribution

It proposes a novel PbDr module that incorporates phoneme classification vectors into speech enhancement networks for better feature regularization.

Findings

01

Boosts perceptual speech quality

02

Improves ASR recognition accuracy

03

Can be integrated into existing networks

Abstract

Existing speech enhancement methods mainly separate speech from noises at the signal level or in the time-frequency domain. They seldom pay attention to the semantic information of a corrupted signal. In this paper, we aim to bridge this gap by extracting phoneme identities to help speech enhancement. Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner. As different phonemes always lead to different feature distributions in frequency, we propose to learn a parameter pair, i.e. scale and bias, through a phoneme classification vector to modulate the speech enhancement network. The modulation parameter pair includes not only frame-wise but also frequency-wise conditions, which effectively map features to phoneme-related distributions. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development