Split and Conquer Partial Deepfake Speech
Inbal Rimon, Oren Gal, Haim Permuter

TL;DR
This paper introduces a split-and-conquer framework for partial deepfake speech detection, improving localization and overall accuracy by separating boundary detection from segment classification.
Contribution
It proposes a novel two-stage approach with boundary detection and segment classification, along with a reflection-based multi-length training strategy for robustness.
Findings
Achieves state-of-the-art results on PartialSpoof benchmark.
Improves detection accuracy at multiple temporal resolutions.
Demonstrates robustness and generalization on Half-Truth dataset.
Abstract
Partial deepfake speech detection requires identifying manipulated regions that may occur within short temporal portions of an otherwise bona fide utterance, making the task particularly challenging for conventional utterance-level classifiers. We propose a split-and-conquer framework that decomposes the problem into two stages: boundary detection and segment-level classification. A dedicated boundary detector first identifies temporal transition points, allowing the audio signal to be divided into segments that are expected to contain acoustically consistent content. Each resulting segment is then evaluated independently to determine whether it corresponds to bona fide or fake speech. This formulation simplifies the learning objective by explicitly separating temporal localization from authenticity assessment, allowing each component to focus on a well-defined task. To further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
