Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral   Mapping for Single-channel Speech Enhancement

Kuan-Lin Chen; Daniel D. E. Wong; Ke Tan; Buye Xu; Anurag Kumar; Vamsi; Krishna Ithapu

arXiv:2211.08624·cs.SD·March 9, 2023

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi, Krishna Ithapu

PDF

Open Access

TL;DR

This paper introduces a novel speech enhancement method that models heteroscedastic uncertainty using a multivariate Gaussian likelihood, leading to improved performance by adaptively weighting the loss based on uncertainty estimates.

Contribution

The paper proposes a new approach that incorporates heteroscedastic uncertainty modeling into spectral mapping for speech enhancement, outperforming traditional loss functions.

Findings

01

Modeling heteroscedastic uncertainty improves SE performance.

02

The approach outperforms MSE, MAE, and SI-SDR loss functions.

03

Weakening covariance assumptions enhances the effectiveness of the NLL loss.

Abstract

Most speech enhancement (SE) models learn a point estimate and do not make use of uncertainty estimation in the learning process. In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin. Due to unrestricted heteroscedastic uncertainty, the covariance introduces an undersampling effect, detrimental to SE performance. To mitigate undersampling, our approach inflates the uncertainty lower bound and weights each loss component with their uncertainty, effectively compensating severely undersampled components with more penalties. Our multivariate setting reveals common covariance assumptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies