Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals
Viola Negroni, Davide Salvi, Paolo Bestagini, Stefano Tubaro

TL;DR
This paper investigates artifacts caused by splicing in partially fake speech signals, revealing detection methods that do not require training and highlighting challenges in creating undetectable spliced audio.
Contribution
It provides an analysis of splicing artifacts in fake speech, demonstrating effective detection without training and discussing implications for future research.
Findings
Detection EER of 6.16% on PartialSpoof dataset
Detection EER of 7.36% on HAD dataset
Splicing artifacts can be exploited for detection without training
Abstract
Speech deepfake detection has recently gained significant attention within the multimedia forensics community. Related issues have also been explored, such as the identification of partially fake signals, i.e., tracks that include both real and fake speech segments. However, generating high-quality spliced audio is not as straightforward as it may appear. Spliced signals are typically created through basic signal concatenation. This process could introduce noticeable artifacts that can make the generated data easier to detect. We analyze spliced audio tracks resulting from signal concatenation, investigate their artifacts and assess whether such artifacts introduce any bias in existing datasets. Our findings reveal that by analyzing splicing artifacts, we can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD datasets, respectively, without needing to train any detector.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Speech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need
