Self-supervised pre-training with acoustic configurations for replay   spoofing detection

Hye-jin Shim; Hee-Soo Heo; Jee-weon Jung; and Ha-Jin Yu

arXiv:1910.09778·cs.LG·August 20, 2020·1 cites

Self-supervised pre-training with acoustic configurations for replay spoofing detection

Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, and Ha-Jin Yu

PDF

Open Access

TL;DR

This paper introduces a self-supervised pretraining method for replay spoofing detection that leverages acoustic configurations from existing datasets, improving detection accuracy without extensive physical data collection.

Contribution

It proposes a novel self-supervised framework focusing on acoustic configurations, enabling effective pretraining for replay spoofing detection using datasets from related tasks.

Findings

01

Outperforms baseline by 30% in detection accuracy

02

Effective use of existing datasets for pretraining

03

Improves robustness in replay spoofing detection

Abstract

Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets. In this study, we propose a self-supervised framework for pretraining acoustic configurations using datasets published for other tasks, such as speaker verification. Here, acoustic configurations refer to the environmental factors generated during the process of voice recording but not the voice itself, including microphone types, place and ambient noise levels. Specifically, we select pairs of segments from utterances and train deep neural networks to determine whether the acoustic configurations of the two segments are identical. We validate the effectiveness of the proposed method based on the ASVspoof 2019 physical access dataset utilizing two well-performing systems. The experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing