Phoneme-based Distribution Regularization for Speech Enhancement
Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu

TL;DR
This paper introduces a phoneme-based distribution regularization method that leverages phoneme information to improve speech enhancement quality and ASR accuracy by modulating feature distributions.
Contribution
It proposes a novel PbDr module that incorporates phoneme classification vectors into speech enhancement networks for better feature regularization.
Findings
Boosts perceptual speech quality
Improves ASR recognition accuracy
Can be integrated into existing networks
Abstract
Existing speech enhancement methods mainly separate speech from noises at the signal level or in the time-frequency domain. They seldom pay attention to the semantic information of a corrupted signal. In this paper, we aim to bridge this gap by extracting phoneme identities to help speech enhancement. Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner. As different phonemes always lead to different feature distributions in frequency, we propose to learn a parameter pair, i.e. scale and bias, through a phoneme classification vector to modulate the speech enhancement network. The modulation parameter pair includes not only frame-wise but also frequency-wise conditions, which effectively map features to phoneme-related distributions. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
