I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Dianwen Ng, Jia Qi Yip, Tanmay Surana, Zhao Yang, Chong Zhang, Yukun, Ma, Chongjia Ni, Eng Siong Chng, Bin Ma

TL;DR
This paper introduces I2CR, a contrastive regularization technique that enhances noise robustness in keyword spotting by improving feature clustering and generalization across various noise conditions.
Contribution
The paper presents a novel Inter-Intra Contrastive Regularization method that improves feature representations and noise robustness in keyword spotting models.
Findings
Consistent accuracy improvements across different models and noise environments.
Enhanced performance on unseen out-of-domain noises.
Better robustness to various noise SNRs.
Abstract
Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
