Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection
Ivan Viakhirev, Daniil Sirota, Aleksandr Smirnov, Kirill Borodin

TL;DR
This paper refines the AASIST architecture for speech deepfake detection by integrating a frozen Wav2Vec 2.0 encoder, replacing graph attention with multi-head attention, and using a trainable fusion layer, achieving improved accuracy on the ASVspoof 5 dataset.
Contribution
It introduces targeted architectural modifications to the AASIST model, enhancing its performance and robustness in speech deepfake detection tasks.
Findings
Achieved 7.6% EER on ASVspoof 5 corpus.
Each architectural change contributes to performance improvements.
Code is publicly available for reproducibility.
Abstract
Advances in voice conversion and text-to-speech synthesis have made automatic speaker verification (ASV) systems more susceptible to spoofing attacks. This work explores modest refinements to the AASIST anti-spoofing architecture. It incorporates a frozen Wav2Vec 2.0 encoder to retain self-supervised speech representations in limited-data settings, substitutes the original graph attention block with a standardized multi-head attention module using heterogeneous query projections, and replaces heuristic frame-segment fusion with a trainable, context-aware integration layer. When evaluated on the ASVspoof 5 corpus, the proposed system reaches a 7.6\% equal error rate (EER), improving on a re-implemented AASIST baseline under the same training conditions. Ablation experiments suggest that each architectural change contributes to the overall performance, indicating that targeted adjustments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
