PatchDSU: Uncertainty Modeling for Out of Distribution Generalization in Keyword Spotting
Bronya Roni Chernyak, Yael Segal, Yosi Shrem, Joseph Keshet

TL;DR
This paper introduces PatchDSU, a novel method that improves out-of-distribution generalization in keyword spotting by splitting spectrogram inputs into patches and augmenting them independently, addressing domain shifts effectively.
Contribution
PatchDSU extends DSU by partitioning input spectrograms into patches for independent augmentation, enhancing robustness to distribution shifts in speech recognition tasks.
Findings
PatchDSU outperforms DSU and other methods in most scenarios.
PatchDSU shows more consistent improvements across various datasets.
Both PatchDSU and DSU improve out-of-domain generalization.
Abstract
Deep learning models excel at many tasks but rely on the assumption that training and test data follow the same distribution. This assumption often does not hold in real-world speech systems, where distribution shifts are common due to varying environments, recording conditions, and speaker diversity. The method of Domain Shifts with Uncertainty (DSU) augments the input of each neural network layer based on the input feature statistics. It addresses the problem of out-of-domain generalization by assuming feature statistics follow a multivariate Gaussian distribution and substitutes the input with sampled features from this distribution. While effective for computer vision, applying DSU to speech presents challenges due to the nature of the data. Unlike static visual data, speech is a temporal signal commonly represented by a spectrogram - the change of frequency over time. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
