I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra   Contrastive Regularization

Dianwen Ng; Jia Qi Yip; Tanmay Surana; Zhao Yang; Chong Zhang; Yukun; Ma; Chongjia Ni; Eng Siong Chng; Bin Ma

arXiv:2209.06360·cs.SD·September 15, 2022

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Dianwen Ng, Jia Qi Yip, Tanmay Surana, Zhao Yang, Chong Zhang, Yukun, Ma, Chongjia Ni, Eng Siong Chng, Bin Ma

PDF

Open Access

TL;DR

This paper introduces I2CR, a contrastive regularization technique that enhances noise robustness in keyword spotting by improving feature clustering and generalization across various noise conditions.

Contribution

The paper presents a novel Inter-Intra Contrastive Regularization method that improves feature representations and noise robustness in keyword spotting models.

Findings

01

Consistent accuracy improvements across different models and noise environments.

02

Enhanced performance on unseen out-of-domain noises.

03

Better robustness to various noise SNRs.

Abstract

Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing