KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features

Ivan Kukanov; Jun Wah Ng

arXiv:2508.07337·eess.AS·August 12, 2025

KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features

Ivan Kukanov, Jun Wah Ng

PDF

Open Access

TL;DR

This paper introduces a multimodal deepfake detection method combining handcrafted visual features and SSL-based audio representations, achieving high accuracy and robustness against unseen deepfake attacks.

Contribution

It proposes a novel multimodal approach utilizing handcrafted visual features and SSL-based audio models for improved deepfake detection and localization.

Findings

01

Achieved 92.78% AUC on deepfake classification

02

Attained 0.3536 IoU for temporal localization using audio

03

Balances detection performance with real-world applicability

Abstract

The rapid development of audio-driven talking head generators and advanced Text-To-Speech (TTS) models has led to more sophisticated temporal deepfakes. These advances highlight the need for robust methods capable of detecting and localizing deepfakes, even under novel, unseen attack scenarios. Current state-of-the-art deepfake detectors, while accurate, are often computationally expensive and struggle to generalize to novel manipulation techniques. To address these challenges, we propose multimodal approaches for the AV-Deepfake1M 2025 challenge. For the visual modality, we leverage handcrafted features to improve interpretability and adaptability. For the audio modality, we adapt a self-supervised learning (SSL) backbone coupled with graph attention networks to capture rich audio representations, improving detection robustness. Our approach strikes a balance between performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications