PatchDSU: Uncertainty Modeling for Out of Distribution Generalization in Keyword Spotting

Bronya Roni Chernyak; Yael Segal; Yosi Shrem; Joseph Keshet

arXiv:2508.03190·eess.AS·August 6, 2025

PatchDSU: Uncertainty Modeling for Out of Distribution Generalization in Keyword Spotting

Bronya Roni Chernyak, Yael Segal, Yosi Shrem, Joseph Keshet

PDF

TL;DR

This paper introduces PatchDSU, a novel method that improves out-of-distribution generalization in keyword spotting by splitting spectrogram inputs into patches and augmenting them independently, addressing domain shifts effectively.

Contribution

PatchDSU extends DSU by partitioning input spectrograms into patches for independent augmentation, enhancing robustness to distribution shifts in speech recognition tasks.

Findings

01

PatchDSU outperforms DSU and other methods in most scenarios.

02

PatchDSU shows more consistent improvements across various datasets.

03

Both PatchDSU and DSU improve out-of-domain generalization.

Abstract

Deep learning models excel at many tasks but rely on the assumption that training and test data follow the same distribution. This assumption often does not hold in real-world speech systems, where distribution shifts are common due to varying environments, recording conditions, and speaker diversity. The method of Domain Shifts with Uncertainty (DSU) augments the input of each neural network layer based on the input feature statistics. It addresses the problem of out-of-domain generalization by assuming feature statistics follow a multivariate Gaussian distribution and substitutes the input with sampled features from this distribution. While effective for computer vision, applying DSU to speech presents challenges due to the nature of the data. Unlike static visual data, speech is a temporal signal commonly represented by a spectrogram - the change of frequency over time. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.