Generalized Spoofing Detection Inspired from Audio Generation Artifacts

Yang Gao; Tyler Vuong; Mahsa Elyasi; Gaurav Bharaj; Rita Singh

arXiv:2104.04111·cs.SD·June 29, 2021

Generalized Spoofing Detection Inspired from Audio Generation Artifacts

Yang Gao, Tyler Vuong, Mahsa Elyasi, Gaurav Bharaj, Rita Singh

PDF

Open Access

TL;DR

This paper introduces a novel 2D DCT spectro-temporal feature for audio deepfake detection, outperforming existing features and achieving state-of-the-art results by capturing artifacts in the frequency domain.

Contribution

The paper proposes a new 2D DCT feature for spoofing detection, combined with CNN, improving detection accuracy and generalization over previous methods.

Findings

01

Achieved a 14% reduction in t-DCF score over previous top systems.

02

Demonstrated the effectiveness of the 2D DCT feature over traditional features.

03

Validated the model's generalization on external datasets.

Abstract

State-of-the-art methods for audio generation suffer from fingerprint artifacts and repeated inconsistencies across temporal and spectral domains. Such artifacts could be well captured by the frequency domain analysis over the spectrogram. Thus, we propose a novel use of long-range spectro-temporal modulation feature -- 2D DCT over log-Mel spectrogram for the audio deepfake detection. We show that this feature works better than log-Mel spectrogram, CQCC, MFCC, as a suitable candidate to capture such artifacts. We employ spectrum augmentation and feature normalization to decrease overfitting and bridge the gap between training and test dataset along with this novel feature introduction. We developed a CNN-based baseline that achieved a 0.0849 t-DCF and outperformed the previously top single systems reported in the ASVspoof 2019 challenge. Finally, by combining our baseline with our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection