Audio Deepfake Detection at the First Greeting: "Hi!"

Haohan Shi; Xiyu Shi; Safak Dogan; Tianjin Huang; Yunxiao Zhang

arXiv:2601.19573·eess.AS·May 12, 2026

Audio Deepfake Detection at the First Greeting: "Hi!"

Haohan Shi, Xiyu Shi, Safak Dogan, Tianjin Huang, Yunxiao Zhang

PDF

TL;DR

This paper introduces S-MGAA, a lightweight model for detecting audio deepfakes in very short speech segments, emphasizing robustness and efficiency for real-time communication scenarios.

Contribution

It proposes a novel extension of Multi-Granularity Adaptive Time-Frequency Attention with modules tailored for short, degraded audio inputs, improving detection accuracy and efficiency.

Findings

01

S-MGAA outperforms nine state-of-the-art baselines.

02

It demonstrates robustness to communication degradations.

03

It offers a favorable efficiency-accuracy trade-off for real-time deployment.

Abstract

This paper focuses on audio deepfake detection under real-world communication degradations, with an emphasis on ultra-short inputs (0.5-2.0s), targeting the capability to detect synthetic speech at a conversation opening, e.g., when a scammer says "Hi." We propose Short-MGAA (S-MGAA), a novel lightweight extension of Multi-Granularity Adaptive Time-Frequency Attention, designed to enhance discriminative representation learning for short, degraded inputs subjected to communication processing and perturbations. The S-MGAA integrates two tailored modules: a Pixel-Channel Enhanced Module (PCEM) that amplifies fine-grained time-frequency saliency, and a Frequency Compensation Enhanced Module (FCEM) to supplement limited temporal evidence via multi-scale frequency modeling and adaptive frequency-temporal interaction. Extensive experiments demonstrate that S-MGAA consistently surpasses nine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.