Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown   Multi-Class Ensemble of CNNs

Md Awsafur Rahman; Bishmoy Paul; Najibul Haque Sarker; Zaber Ibn Abdul; Hakim; Shaikh Anowarul Fattah; Mohammad Saquib

arXiv:2309.08146·cs.SD·September 18, 2023·2 cites

Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs

Md Awsafur Rahman, Bishmoy Paul, Najibul Haque Sarker, Zaber Ibn Abdul, Hakim, Shaikh Anowarul Fattah, Mohammad Saquib

PDF

Open Access 1 Repo

TL;DR

This paper introduces Syn-Att, a semi-supervised ensemble CNN approach for attributing synthetic speech to its generator, significantly improving robustness and accuracy in distinguishing among multiple synthetic speech algorithms.

Contribution

It presents a novel semi-supervised ensemble CNN method for synthetic speech attribution, enhancing robustness and generalization across different datasets.

Findings

01

Outperforms top methods by 12-13% on strongly perturbed data

02

Achieves 1-2% accuracy improvement on less perturbed data

03

Validated on datasets with 18,000 and 10,000 synthetic speeches

Abstract

With the huge technological advances introduced by deep learning in audio & speech processing, many novel synthetic speech techniques achieved incredible realistic results. As these methods generate realistic fake human voices, they can be used in malicious acts such as people imitation, fake news, spreading, spoofing, media manipulations, etc. Hence, the ability to detect synthetic or natural speech has become an urgent necessity. Moreover, being able to tell which algorithm has been used to generate a synthetic speech track can be of preeminent importance to track down the culprit. In this paper, a novel strategy is proposed to attribute a synthetic speech track to the generator that is used to synthesize it. The proposed detector transforms the audio into log-mel spectrogram, extracts features using CNN, and classifies it between five known and unknown algorithms, utilizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

awsaf49/synatt
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing