VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks
Heitor R. Guimar\~aes, Arthur Pimentel, Anderson Avila, Tiago H. Falk

TL;DR
This paper introduces VIC-KD, a novel knowledge distillation method that enhances the robustness of keyword spotting models against adversarial attacks by leveraging geometric priors in self-supervised speech representations.
Contribution
VIC-KD is a new robust distillation approach that applies geometric priors to improve adversarial robustness and model compression in keyword spotting tasks.
Findings
VIC-KD outperforms state-of-the-art methods by 12% in robust accuracy.
Imposing geometric priors enhances model robustness against adversarial attacks.
The method effectively compresses models suitable for edge devices.
Abstract
Keyword spotting (KWS) refers to the task of identifying a set of predefined words in audio streams. With the advances seen recently with deep neural networks, it has become a popular technology to activate and control small devices, such as voice assistants. Relying on such models for edge devices, however, can be challenging due to hardware constraints. Moreover, as adversarial attacks have increased against voice-based technologies, developing solutions robust to such attacks has become crucial. In this work, we propose VIC-KD, a robust distillation recipe for model compression and adversarial robustness. Using self-supervised speech representations, we show that imposing geometric priors to the latent representations of both Teacher and Student models leads to more robust target models. Experiments on the Google Speech Commands datasets show that the proposed methodology improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
