VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make   Keyword Spotting More Robust Against Adversarial Attacks

Heitor R. Guimar\~aes; Arthur Pimentel; Anderson Avila; Tiago H. Falk

arXiv:2309.12914·eess.AS·September 25, 2023

VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks

Heitor R. Guimar\~aes, Arthur Pimentel, Anderson Avila, Tiago H. Falk

PDF

Open Access

TL;DR

This paper introduces VIC-KD, a novel knowledge distillation method that enhances the robustness of keyword spotting models against adversarial attacks by leveraging geometric priors in self-supervised speech representations.

Contribution

VIC-KD is a new robust distillation approach that applies geometric priors to improve adversarial robustness and model compression in keyword spotting tasks.

Findings

01

VIC-KD outperforms state-of-the-art methods by 12% in robust accuracy.

02

Imposing geometric priors enhances model robustness against adversarial attacks.

03

The method effectively compresses models suitable for edge devices.

Abstract

Keyword spotting (KWS) refers to the task of identifying a set of predefined words in audio streams. With the advances seen recently with deep neural networks, it has become a popular technology to activate and control small devices, such as voice assistants. Relying on such models for edge devices, however, can be challenging due to hardware constraints. Moreover, as adversarial attacks have increased against voice-based technologies, developing solutions robust to such attacks has become crucial. In this work, we propose VIC-KD, a robust distillation recipe for model compression and adversarial robustness. Using self-supervised speech representations, we show that imposing geometric priors to the latent representations of both Teacher and Student models leads to more robust target models. Experiments on the Google Speech Commands datasets show that the proposed methodology improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing