ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV   Systems for the ASVspoof5 Challenge

Juan M. Mart\'in-Do\~nas; Eros Rosell\'o; Angel M. Gomez and; Aitor \'Alvarez; Iv\'an L\'opez-Espejo; Antonio M. Peinado

arXiv:2408.10361·eess.AS·August 21, 2024

ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge

Juan M. Mart\'in-Do\~nas, Eros Rosell\'o, Angel M. Gomez and, Aitor \'Alvarez, Iv\'an L\'opez-Espejo, Antonio M. Peinado

PDF

Open Access

TL;DR

This paper details the participation of the ASASVIcomtech team in the ASVspoof5 Challenge, exploring deepfake detection and speaker verification with various models, data analysis, and ensemble techniques, achieving competitive results.

Contribution

The paper introduces a comprehensive analysis of challenge data and explores multiple open-condition systems using self-supervised models and ensemble methods for spoofing detection and speaker verification.

Findings

01

Closed-condition system with deep complex convolutional recurrent architecture did not yield noteworthy results.

02

Open-condition systems leveraging self-supervised models showed promising performance.

03

Ensemble systems achieved very competitive results in both challenge tracks.

Abstract

This paper presents the work carried out by the ASASVIcomtech team, made up of researchers from Vicomtech and University of Granada, for the ASVspoof5 Challenge. The team has participated in both Track 1 (speech deepfake detection) and Track 2 (spoofing-aware speaker verification). This work started with an analysis of the challenge available data, which was regarded as an essential step to avoid later potential biases of the trained models, and whose main conclusions are presented here. With respect to the proposed approaches, a closed-condition system employing a deep complex convolutional recurrent architecture was developed for Track 1, although, unfortunately, no noteworthy results were achieved. On the other hand, different possibilities of open-condition systems, based on leveraging self-supervised models, augmented training data from previous challenges, and novel vocoders, were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling