Towards Robust Waveform-Based Acoustic Models
Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, and Bin Yu

TL;DR
This paper proposes a waveform-based data augmentation method grounded in vicinal risk minimization to improve the robustness of acoustic models in adverse environments, demonstrating significant out-of-distribution generalization improvements.
Contribution
It introduces a theoretically motivated data augmentation scheme based on Gaussian mixtures to enhance robustness of waveform-based acoustic models against environmental variability.
Findings
150% relative improvement in out-of-distribution generalization
Effective in unseen noise conditions
Competitive with models trained on matched acoustic data
Abstract
We study the problem of learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions. This problem is of paramount importance for the deployment of speech recognition systems that need to perform well in unseen environments. First, we characterize data augmentation theoretically as an instance of vicinal risk minimization, which aims at improving risk estimates during training by replacing the delta functions that define the empirical density over the input space with an approximation of the marginal population density in the vicinity of the training samples. More specifically, we assume that local neighborhoods centered at training samples can be approximated using a mixture of Gaussians, and demonstrate theoretically that this can incorporate robust inductive bias into the learning process. We then specify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
