Investigation of Synthetic Speech Detection Using Frame- and   Segment-Specific Importance Weighting

Ali Khodabakhsh; Cenk Demiroglu

arXiv:1610.03009·cs.SD·October 11, 2016

Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting

Ali Khodabakhsh, Cenk Demiroglu

PDF

Open Access

TL;DR

This paper proposes three algorithms that assign importance weights to speech segments, phonemes, and sound classes to improve synthetic speech detection, showing significant gains for known attacks but limited improvements for unknown attack types.

Contribution

The paper introduces novel weighting algorithms for speech segments, phonemes, and sound classes to enhance synthetic speech detection accuracy.

Findings

01

Significant improvement for known attack methods.

02

Limited improvement for unknown attack types.

03

Weighted scoring outperforms baseline in specific scenarios.

Abstract

Speaker verification systems are vulnerable to spoofing attacks which presents a major problem in their real-life deployment. To date, most of the proposed synthetic speech detectors (SSDs) have weighted the importance of different segments of speech equally. However, different attack methods have different strengths and weaknesses and the traces that they leave may be short or long term acoustic artifacts. Moreover, those may occur for only particular phonemes or sounds. Here, we propose three algorithms that weigh likelihood-ratio scores of individual frames, phonemes, and sound-classes depending on their importance for the SSD. Significant improvement over the baseline system has been obtained for known attack methods that were used in training the SSDs. However, improvement with unknown attack types was not substantial. Thus, the type of distortions that were caused by the unknown…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD