Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
Cunhang Fan, Jun Xue, Jianhua Tao, Jiangyan Yi, Chenglong Wang,, Chengshi Zheng, Zhao Lv

TL;DR
This paper introduces a novel F0 subband feature and a spatial reconstructed local attention Res2Net model for fake speech detection, achieving state-of-the-art results on the ASVspoof 2019 LA dataset.
Contribution
It proposes a new F0 subband feature and a specialized neural network architecture for improved fake speech detection.
Findings
Achieved an EER of 0.47% on ASVspoof 2019 LA dataset.
Outperformed existing single systems in fake speech detection.
Demonstrated the effectiveness of F0 subband features and spatial reconstructed local attention mechanisms.
Abstract
The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Speech Recognition and Synthesis · Speech and Audio Processing
