Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

Lin Zhang; Xin Wang; Erica Cooper; Mireia Diez; Federico Landini,; Nicholas Evans; Junichi Yamagishi

arXiv:2406.07816·eess.AS·June 13, 2024

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

Lin Zhang, Xin Wang, Erica Cooper, Mireia Diez, Federico Landini,, Nicholas Evans, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper introduces Spoof Diarization, a new task to identify and cluster spoofed regions in audio, along with a benchmark model and evaluation metrics, highlighting the task's complexity.

Contribution

It defines the novel task of Spoof Diarization, proposes the 3C model, and establishes evaluation protocols for partially spoofed audio scenarios.

Findings

01

Spoof diarization is highly complex even with simplified conditions.

02

The 3C model effectively supports spoof localization and clustering.

03

Training countermeasures improves spoof diarization performance.

Abstract

This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pioneering study in spoof diarization, we focus on defining the task, establishing evaluation metrics, and proposing a benchmark model, namely the Countermeasure-Condition Clustering (3C) model. Utilizing this model, we first explore how to effectively train countermeasures to support spoof diarization using three labeling schemes. We then utilize spoof localization predictions to enhance the diarization performance. This first study reveals the high complexity of the task, even in restricted scenarios where only a single speaker per audio file and an oracle number of spoofing methods are considered. Our code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/partialspoof
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsFocus