Quantum Vision Theory Applied to Audio Classification for Deepfake Speech Detection

Khalid Zaman; Melike Sah; Anuwat Chaiwongyenc; Cem Direkoglu

arXiv:2604.08104·cs.CL·April 10, 2026

Quantum Vision Theory Applied to Audio Classification for Deepfake Speech Detection

Khalid Zaman, Melike Sah, Anuwat Chaiwongyenc, Cem Direkoglu

PDF

TL;DR

This paper introduces Quantum Vision (QV) theory for audio classification, transforming speech features into information waves to improve deepfake speech detection accuracy and robustness.

Contribution

It applies QV theory to speech spectrograms, creating QV-based neural networks that outperform traditional models in deepfake detection tasks.

Findings

01

QV-based models outperform standard CNN and ViT models.

02

QV-CNN with MFCC features achieves 94.20% accuracy and 9.04% EER.

03

QV models show improved robustness in distinguishing genuine and spoofed speech.

Abstract

We propose Quantum Vision (QV) theory as a new perspective for deep learning-based audio classification, applied to deepfake speech detection. Inspired by particle-wave duality in quantum physics, QV theory is based on the idea that data can be represented not only in its observable, collapsed form, but also as information waves. In conventional deep learning, models are trained directly on these collapsed representations, such as images. In QV theory, inputs are first transformed into information waves using a QV block, and then fed into deep learning models for classification. QV-based models improve performance in image classification compared to their non-QV counterparts. What if QV theory is applied speech spectrograms for audio classification tasks? This is the motivation and novelty of the proposed approach. In this work, Short-Time Fourier Transform (STFT), Mel-spectrograms, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.