All-for-One and One-For-All: Deep learning-based feature fusion for   Synthetic Speech Detection

Daniele Mari; Davide Salvi; Paolo Bestagini; and Simone Milani

arXiv:2307.15555·cs.SD·July 31, 2023·1 cites

All-for-One and One-For-All: Deep learning-based feature fusion for Synthetic Speech Detection

Daniele Mari, Davide Salvi, Paolo Bestagini, and Simone Milani

PDF

Open Access

TL;DR

This paper introduces a deep learning-based feature fusion model for synthetic speech detection, improving accuracy and robustness against anti-forensic attacks compared to existing methods.

Contribution

It proposes a novel fusion approach combining three feature sets, enhancing detection performance and generalization in synthetic speech detection tasks.

Findings

01

Achieved better performance than state-of-the-art methods

02

Demonstrated robustness to anti-forensic attacks

03

Proved effective across multiple datasets and scenarios

Abstract

Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever, leading to possible threats and dangers from malicious users. In the audio field, we are witnessing the growth of speech deepfake generation techniques, which solicit the development of synthetic speech detection algorithms to counter possible mischievous uses such as frauds or identity thefts. In this paper, we consider three different feature sets proposed in the literature for the synthetic speech detection task and present a model that fuses them, achieving overall better performances with respect to the state-of-the-art solutions. The system was tested on different scenarios and datasets to prove its robustness to anti-forensic attacks and its generalization capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing