Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai, Weiqing Wang, Ming Li

TL;DR
This paper introduces a deep learning-based waveform boundary detection system for identifying and locating partially spoofed audio segments, addressing a critical security threat posed by audio deepfakes.
Contribution
It presents a novel frame-level detection method trained on ADD2022 data, achieving state-of-the-art results in locating manipulated audio segments.
Findings
Achieved an EER of 6.58% on the ADD2022 test set.
Outperformed existing systems in detecting and locating partial audio spoofing.
Evaluated various acoustic features and network configurations for optimal performance.
Abstract
The present paper proposes a waveform boundary detection system for audio spoofing attacks containing partially manipulated segments. Partially spoofed/fake audio, where part of the utterance is replaced, either with synthetic or natural audio clips, has recently been reported as one scenario of audio deepfakes. As deepfakes can be a threat to social security, the detection of such spoofing audio is essential. Accordingly, we propose to address the problem with a deep learning-based frame-level detection system that can detect partially spoofed audio and locate the manipulated pieces. Our proposed method is trained and evaluated on data provided by the ADD2022 Challenge. We evaluate our detection model concerning various acoustic features and network configurations. As a result, our detection system achieves an equal error rate (EER) of 6.58% on the ADD2022 challenge test set, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Music and Audio Processing · Speech Recognition and Synthesis
MethodsTest
