Data augmentation and loss normalization for deep noise suppression
Sebastian Braun, Ivan Tashev

TL;DR
This paper explores data augmentation and loss normalization techniques to improve deep neural network-based speech enhancement, demonstrating that spectral, dynamic, and SNR augmentation combined with sequence-level normalization enhances training stability and performance.
Contribution
It introduces a novel combination of data augmentation strategies with sequence-level normalization for loss functions in speech enhancement neural networks.
Findings
Augmenting SNR, spectral, and dynamic levels improves regularization.
Sequence-level normalization mitigates training degradation from level imbalance.
Enhanced speech enhancement performance demonstrated in experiments.
Abstract
Speech enhancement using neural networks is recently receiving large attention in research and being integrated in commercial devices and applications. In this work, we investigate data augmentation techniques for supervised deep learning-based speech enhancement. We show that not only augmenting SNR values to a broader range and a continuous distribution helps to regularize training, but also augmenting the spectral and dynamic level diversity. However, to not degrade training by level augmentation, we propose a modification to signal-based loss functions by applying sequence level normalization. We show in experiments that this normalization overcomes the degradation caused by training on sequences with imbalanced signal levels, when using a level-dependent loss function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
