SIGNL: A Label-Efficient Audio Deepfake Detection System via Spectral-Temporal Graph Non-Contrastive Learning
Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna

TL;DR
SIGNL introduces a spectral-temporal graph non-contrastive learning framework that efficiently detects audio deepfakes using minimal labeled data, leveraging structured spectral-temporal representations for improved accuracy and generalization.
Contribution
The paper presents a novel dual-view graph modeling approach for audio deepfake detection, combining spectral and temporal graphs with non-contrastive self-supervised learning.
Findings
Achieves 7.88% EER on ASVspoof 2021 DF with minimal labeled data.
Outperforms existing methods on multiple benchmarks.
Generalizes well to unseen conditions, demonstrating robustness.
Abstract
Audio deepfake detection is increasingly important as synthetic speech becomes more realistic and accessible. Recent methods, including those using graph neural networks (GNNs) to model frequency and temporal dependencies, show strong potential but need large amounts of labeled data, which limits their practical use. Label-efficient alternatives like graph-based non-contrastive learning offer a potential solution, as they can learn useful representations from unlabeled data without using negative samples. However, current graph non-contrastive approaches are built for single-view graph representations and cannot be directly used for audio, which has unique spectral and temporal structures. Bridging this gap requires dual-view graph modeling suited to audio signals. In this work, we introduce SIGNL (Spectral-temporal vIsion Graph Non-contrastive Learning), a label-efficient expert system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
