The SYSU System for the Interspeech 2015 Automatic Speaker Verification Spoofing and Countermeasures Challenge
Shitao Weng, Shushan Chen, Lei Yu, Xuewei Wu, Weicheng Cai, Zhi Liu,, Ming Li

TL;DR
This paper presents a multi-feature fusion system using i-vector subsystems and classifiers to detect spoofed speech in speaker verification, achieving very low error rates on the INTERSPEECH 2015 challenge dataset.
Contribution
It introduces a novel fusion approach combining acoustic, phase, and phonetic features with multiple classifiers for improved spoofing detection.
Findings
Achieved 0.29% EER on development set
Achieved 3.26% EER on test set
Enhanced performance through feature and score fusion
Abstract
Many existing speaker verification systems are reported to be vulnerable against different spoofing attacks, for example speaker-adapted speech synthesis, voice conversion, play back, etc. In order to detect these spoofed speech signals as a countermeasure, we propose a score level fusion approach with several different i-vector subsystems. We show that the acoustic level Mel-frequency cepstral coefficients (MFCC) features, the phase level modified group delay cepstral coefficients (MGDCC) and the phonetic level phoneme posterior probability (PPP) tandem features are effective for the countermeasure. Furthermore, feature level fusion of these features before i-vector modeling also enhance the performance. A polynomial kernel support vector machine is adopted as the supervised classifier. In order to enhance the generalizability of the countermeasure, we also adopted the cosine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
