Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified   Spoofing Detection

Awais Khan; Khalid Mahmood Malik; Shah Nawaz

arXiv:2309.09837·cs.SD·September 19, 2023·1 cites

Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection

Awais Khan, Khalid Mahmood Malik, Shah Nawaz

PDF

Open Access

TL;DR

This paper introduces a spectra-temporal fusion method using novel coefficients and an auto-encoder to detect various voice spoofing attacks, including synthetic, replay, and deepfake, across multiple datasets.

Contribution

It proposes a unified spectra-temporal approach with new coefficients and an auto-encoder to improve spoofing detection robustness across attack types.

Findings

01

Effective against synthetic, replay, and deepfake attacks

02

Robust performance on multiple benchmark datasets

03

Addresses spectral and temporal spoofing artifacts

Abstract

Voice spoofing attacks pose a significant threat to automated speaker verification systems. Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks. However, in real-world scenarios, the countermeasures are unaware of the generation schema of the attack, necessitating a unified solution. Current unified solutions struggle to detect spoofing artifacts, especially with recent spoofing mechanisms. For instance, the spoofing algorithms inject spectral or temporal anomalies, which are challenging to identify. To this end, we present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients. We introduce a novel local spectral deviation coefficient (SDC) for frame-level inconsistencies and employ a bi-LSTM-based network for sequential temporal coefficients (STC), which capture utterance-level artifacts. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing