PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag, Kumar, Shinji Watanabe, Bhiksha Raj

TL;DR
This paper introduces PAAPLoss, a novel auxiliary loss function for speech enhancement that aligns acoustic parameters with perceptual quality, improving output naturalness and interpretability.
Contribution
The paper proposes a neural network-based method to estimate non-differentiable acoustic parameters and incorporates phoneme-specific weighting to enhance speech quality.
Findings
Improves speech enhancement performance in both time and frequency domains
Enhances perceptual quality as measured by standard metrics
Provides interpretability through phoneme-dependent analysis
Abstract
Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non-differentiable, and we develop a neural network estimator that can accurately predict their time-series values across an utterance. We also model phoneme-specific weights for each feature, as the acoustic parameters are known to show different behavior in different phonemes. We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features. Experimentally we show that it improves speech enhancement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
