Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition
Jordan Prescott, Thanathai Lertpetchpun, Shrikanth Narayanan

TL;DR
This paper investigates how the granularity of neural audio codecs affects the balance between speech recognition accuracy and robustness to adversarial attacks, revealing a non-linear trade-off influenced by quantization depth.
Contribution
It introduces an analysis of residual vector quantization depth's impact on adversarial robustness and demonstrates neural codecs outperform traditional defenses under adaptive attacks.
Findings
Shallow quantization suppresses adversarial noise but degrades speech quality.
Deeper quantization preserves content and adversarial perturbations.
Intermediate quantization depths optimize robustness and transcription accuracy.
Abstract
Adversarial perturbations exploit vulnerabilities in automatic speech recognition (ASR) systems while preserving human perceived linguistic content. Neural audio codecs impose a discrete bottleneck that can suppress fine-grained signal variations associated with adversarial noise. We examine how the granularity of this bottleneck, controlled by residual vector quantization (RVQ) depth, shapes adversarial robustness. We observe a non-monotonic trade-off under gradient-based attacks: shallow quantization suppresses adversarial perturbations but degrades speech content, while deeper quantization preserves both content and perturbations. Intermediate depths balance these effects and minimize transcription error. We further show that adversarially induced changes in discrete codebook tokens strongly correlate with transcription error. These gains persist under adaptive attacks, where neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Wireless Signal Modulation Classification · Explainable Artificial Intelligence (XAI)
