Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
Muhammad Salman Khan, Moreno La Quatra, Kuo-Hsuan Hung, Szu-Wei Fu,, Sabato Marco Siniscalchi, Yu Tsao

TL;DR
This paper introduces a novel SSL-based speech enhancement method combining Transformer-based masking, a consistency-preserving loss, and perceptual contrast stretching, achieving state-of-the-art PESQ scores on VoiceBank-DEMAND.
Contribution
It proposes a new SSL speech enhancement framework that integrates Transformer masking, consistency loss, and perceptual contrast stretching for improved performance.
Findings
Achieved a PESQ score of 3.54, surpassing previous SSL-based methods.
Demonstrated effectiveness of consistency-preserving loss in speech enhancement.
Showed that perceptual contrast stretching enhances feature contrast for better enhancement.
Abstract
Self-supervised representation learning (SSL) has attained SOTA results on several downstream speech tasks, but SSL-based speech enhancement (SE) solutions still lag behind. To address this issue, we exploit three main ideas: (i) Transformer-based masking generation, (ii) consistency-preserving loss, and (iii) perceptual contrast stretching (PCS). In detail, conformer layers, leveraging an attention mechanism, are introduced to effectively model frame-level representations and obtain the Ideal Ratio Mask (IRM) for SE. Moreover, we incorporate consistency in the loss function, which processes the input to account for the inconsistency effects of signal reconstruction from the spectrogram. Finally, PCS is employed to improve the contrast of input and target features according to perceptual importance. Evaluated on the VoiceBank-DEMAND task, the proposed solution outperforms previously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
