Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?

Bikash Dutta; Rishabh Ranjan; Shyam Sathvik; Mayank Vatsa; Richa Singh

arXiv:2506.06756·cs.SD·June 10, 2025

Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?

Bikash Dutta, Rishabh Ranjan, Shyam Sathvik, Mayank Vatsa, Richa Singh

PDF

Open Access

TL;DR

This paper evaluates the impact of quantization on zero-shot audio spoofing detection by large audio language models, revealing that FP16 maintains performance while INT8 worsens biases, guiding efficient deployment.

Contribution

It is the first comprehensive study analyzing how quantization affects zero-shot spoof detection in large audio language models, highlighting FP16 as an effective quantization method.

Findings

01

FP16 quantization causes negligible performance loss.

02

INT8 quantization increases biases and reduces accuracy.

03

Models exhibit high biases, performing near random classification.

Abstract

Quantization is essential for deploying large audio language models (LALMs) efficiently in resource-constrained environments. However, its impact on complex tasks, such as zero-shot audio spoofing detection, remains underexplored. This study evaluates the zero-shot capabilities of five LALMs, GAMA, LTU-AS, MERaLiON, Qwen-Audio, and SALMONN, across three distinct datasets: ASVspoof2019, In-the-Wild, and WaveFake, and investigates their robustness to quantization (FP32, FP16, INT8). Despite high initial spoof detection accuracy, our analysis demonstrates severe predictive biases toward spoof classification across all models, rendering their practical performance equivalent to random classification. Interestingly, quantization to FP16 precision resulted in negligible performance degradation compared to FP32, effectively halving memory and computational requirements without materially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing