Multi-Task Learning in Utterance-Level and Segmental-Level Spoof   Detection

Lin Zhang; Xin Wang; Erica Cooper; Junichi Yamagishi

arXiv:2107.14132·cs.SD·September 1, 2021·1 cites

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection

Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi

PDF

Open Access

TL;DR

This paper introduces a multi-task learning framework using an enhanced CNN for simultaneous spoofing detection at segmental and utterance levels, demonstrating improved performance over single-task models.

Contribution

The paper proposes a novel multi-task learning approach with a specialized SELCNN architecture and training strategies for improved spoof detection at multiple levels.

Findings

01

Multi-task models outperform single-task models.

02

Binary-branch architecture better utilizes multi-level information.

03

Fine-tuning with warm-up models yields superior results.

Abstract

In this paper, we provide a series of multi-tasking benchmarks for simultaneously detecting spoofing at the segmental and utterance levels in the PartialSpoof database. First, we propose the SELCNN network, which inserts squeeze-and-excitation (SE) blocks into a light convolutional neural network (LCNN) to enhance the capacity of hidden feature selection. Then, we implement multi-task learning (MTL) frameworks with SELCNN followed by bidirectional long short-term memory (Bi-LSTM) as the basic model. We discuss MTL in PartialSpoof in terms of architecture (uni-branch/multi-branch) and training strategies (from-scratch/warm-up) step-by-step. Experiments show that the multi-task model performs relatively better than single-task models. Also, in MTL, a binary-branch architecture more adequately utilizes information from two levels than a uni-branch model. For the binary-branch architecture,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing