Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition

Jordan Prescott; Thanathai Lertpetchpun; Shrikanth Narayanan

arXiv:2603.09034·eess.AS·March 11, 2026

Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition

Jordan Prescott, Thanathai Lertpetchpun, Shrikanth Narayanan

PDF

Open Access

TL;DR

This paper investigates how the granularity of neural audio codecs affects the balance between speech recognition accuracy and robustness to adversarial attacks, revealing a non-linear trade-off influenced by quantization depth.

Contribution

It introduces an analysis of residual vector quantization depth's impact on adversarial robustness and demonstrates neural codecs outperform traditional defenses under adaptive attacks.

Findings

01

Shallow quantization suppresses adversarial noise but degrades speech quality.

02

Deeper quantization preserves content and adversarial perturbations.

03

Intermediate quantization depths optimize robustness and transcription accuracy.

Abstract

Adversarial perturbations exploit vulnerabilities in automatic speech recognition (ASR) systems while preserving human perceived linguistic content. Neural audio codecs impose a discrete bottleneck that can suppress fine-grained signal variations associated with adversarial noise. We examine how the granularity of this bottleneck, controlled by residual vector quantization (RVQ) depth, shapes adversarial robustness. We observe a non-monotonic trade-off under gradient-based attacks: shallow quantization suppresses adversarial perturbations but degrades speech content, while deeper quantization preserves both content and perturbations. Intermediate depths balance these effects and minimize transcription error. We further show that adversarially induced changes in discrete codebook tokens strongly correlate with transcription error. These gains persist under adaptive attacks, where neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Wireless Signal Modulation Classification · Explainable Artificial Intelligence (XAI)