Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms
Penghui Wen, Kun Hu, Wenxi Yue, Sen Zhang, Wanlei Zhou, Zhiyong Wang

TL;DR
This paper introduces S2pecNet, a novel deep learning approach that fuses multi-order spectral patterns and reconstructs spectrograms to improve robustness against audio spoofing attacks, achieving state-of-the-art results.
Contribution
The paper presents a spectral fusion-reconstruction strategy utilizing multi-order spectral patterns for enhanced audio anti-spoofing performance.
Findings
Achieved an EER of 0.77% on ASVspoof2019 LA dataset
Outperformed existing methods in anti-spoofing accuracy
Demonstrated effectiveness of multi-order spectral fusion
Abstract
Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations. Specifically, spectral patterns up to second-order are fused in a coarse-to-fine manner and two branches are designed for the fine-level fusion from the spectral and temporal contexts. A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss. Our method achieved the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Speech Recognition and Synthesis · Music and Audio Processing
