Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
Seungu Han, Sungho Lee, Kyogu Lee

TL;DR
This paper proposes a phonetic mutual information-based approach to improve speech enhancement models by better preserving linguistic content in noisy conditions, leading to improved recognition accuracy.
Contribution
It introduces a pre-trained linguistic aggregation layer that maximizes mutual information with phonemes, decoupling it from the SE model training for better semantic preservation.
Findings
Improved Word Error Rate (WER) over baseline models.
Mutual information analysis reveals better linguistic content preservation.
Decoupled training enhances robustness to noise.
Abstract
Recent speech enhancement (SE) models increasingly leverage self-supervised learning (SSL) representations for their rich semantic information. Typically, intermediate features are aggregated into a single representation via a lightweight adaptation module. However, most SSL models are not trained for noise robustness, which can lead to corrupted semantic representations. Moreover, the adaptation module is trained jointly with the SE model, potentially prioritizing acoustic details over semantic information, contradicting the original purpose. To address this issue, we first analyze the behavior of SSL models on noisy speech from an information-theoretic perspective. Specifically, we measure the mutual information (MI) between the corrupted SSL representations and the corresponding phoneme labels, focusing on preservation of linguistic contents. Building upon this analysis, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
