Physiological-Physical Feature Fusion for Automatic Voice Spoofing Detection
Junxiao Xue, Hao Zhou, Yabo Wang

TL;DR
This paper introduces a novel physiological-physical feature fusion method using advanced neural networks to improve voice spoofing detection, demonstrating significant performance gains on the ASVspoof 2019 dataset.
Contribution
It proposes a new fusion approach combining physiological and physical features with SE-DenseNet and SE-Res2Net neural networks for enhanced spoofing detection.
Findings
Improves t-DCF and EER scores by 4-10% on ASVspoof 2019 dataset.
Effective in both logical and physical access scenarios.
Demonstrates high parameter efficiency and feature transmission enhancement.
Abstract
Speaker verification systems have been used in many production scenarios in recent years. Unfortunately, they are still highly prone to different kinds of spoofing attacks such as voice conversion and speech synthesis, etc. In this paper, we propose a new method base on physiological-physical feature fusion to deal with voice spoofing attacks. This method involves feature extraction, a densely connected convolutional neural network with squeeze and excitation block (SE-DenseNet), multi-scale residual neural network with squeeze and excitation block (SE-Res2Net) and feature fusion strategies. We first pre-trained a convolutional neural network using the speaker's voice and face in the video as surveillance signals. It can extract physiological features from speech. Then we use SE-DenseNet and SE-Res2Net to extract physical features. Such a densely connection pattern has high parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
