Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake   Speech Detection

Cunhang Fan; Jun Xue; Jianhua Tao; Jiangyan Yi; Chenglong Wang,; Chengshi Zheng; Zhao Lv

arXiv:2308.09944·cs.SD·July 9, 2024·1 cites

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

Cunhang Fan, Jun Xue, Jianhua Tao, Jiangyan Yi, Chenglong Wang,, Chengshi Zheng, Zhao Lv

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel F0 subband feature and a spatial reconstructed local attention Res2Net model for fake speech detection, achieving state-of-the-art results on the ASVspoof 2019 LA dataset.

Contribution

It proposes a new F0 subband feature and a specialized neural network architecture for improved fake speech detection.

Findings

01

Achieved an EER of 0.47% on ASVspoof 2019 LA dataset.

02

Outperformed existing single systems in fake speech detection.

03

Demonstrated the effectiveness of F0 subband features and spatial reconstructed local attention mechanisms.

Abstract

The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JunXue-tech/SRLARes2NetF0Subband
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Speech Recognition and Synthesis · Speech and Audio Processing