Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection

Ivan Viakhirev; Daniil Sirota; Aleksandr Smirnov; Kirill Borodin

arXiv:2507.11777·cs.SD·July 17, 2025

Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection

Ivan Viakhirev, Daniil Sirota, Aleksandr Smirnov, Kirill Borodin

PDF

Open Access 1 Repo

TL;DR

This paper refines the AASIST architecture for speech deepfake detection by integrating a frozen Wav2Vec 2.0 encoder, replacing graph attention with multi-head attention, and using a trainable fusion layer, achieving improved accuracy on the ASVspoof 5 dataset.

Contribution

It introduces targeted architectural modifications to the AASIST model, enhancing its performance and robustness in speech deepfake detection tasks.

Findings

01

Achieved 7.6% EER on ASVspoof 5 corpus.

02

Each architectural change contributes to performance improvements.

03

Code is publicly available for reproducibility.

Abstract

Advances in voice conversion and text-to-speech synthesis have made automatic speaker verification (ASV) systems more susceptible to spoofing attacks. This work explores modest refinements to the AASIST anti-spoofing architecture. It incorporates a frozen Wav2Vec 2.0 encoder to retain self-supervised speech representations in limited-data settings, substitutes the original graph attention block with a standardized multi-head attention module using heterogeneous query projections, and replaces heuristic frame-segment fusion with a trainable, context-aware integration layer. When evaluated on the ASVspoof 5 corpus, the proposed system reaches a 7.6\% equal error rate (EER), improving on a re-implemented AASIST baseline under the same training conditions. Ablation experiments suggest that each architectural change contributes to the overall performance, indicating that targeted adjustments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KORALLLL/AASIST_SCALING
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques