Waveform Boundary Detection for Partially Spoofed Audio

Zexin Cai; Weiqing Wang; Ming Li

arXiv:2211.00226·eess.AS·November 2, 2022·1 cites

Waveform Boundary Detection for Partially Spoofed Audio

Zexin Cai, Weiqing Wang, Ming Li

PDF

Open Access

TL;DR

This paper introduces a deep learning-based waveform boundary detection system for identifying and locating partially spoofed audio segments, addressing a critical security threat posed by audio deepfakes.

Contribution

It presents a novel frame-level detection method trained on ADD2022 data, achieving state-of-the-art results in locating manipulated audio segments.

Findings

01

Achieved an EER of 6.58% on the ADD2022 test set.

02

Outperformed existing systems in detecting and locating partial audio spoofing.

03

Evaluated various acoustic features and network configurations for optimal performance.

Abstract

The present paper proposes a waveform boundary detection system for audio spoofing attacks containing partially manipulated segments. Partially spoofed/fake audio, where part of the utterance is replaced, either with synthetic or natural audio clips, has recently been reported as one scenario of audio deepfakes. As deepfakes can be a threat to social security, the detection of such spoofing audio is essential. Accordingly, we propose to address the problem with a deep learning-based frame-level detection system that can detect partially spoofed audio and locate the manipulated pieces. Our proposed method is trained and evaluated on data provided by the ADD2022 Challenge. We evaluate our detection model concerning various acoustic features and network configurations. As a result, our detection system achieves an equal error rate (EER) of 6.58% on the ADD2022 challenge test set, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Music and Audio Processing · Speech Recognition and Synthesis

MethodsTest