TL;DR
This paper introduces a coarse-to-fine framework for detecting and localizing audio forgeries, utilizing novel modules to improve accuracy in identifying subtle manipulations in long audio clips.
Contribution
The proposed CFPRF framework combines frame-level detection and proposal refinement with new feature learning modules, advancing audio forgery localization beyond existing classification methods.
Findings
Achieves state-of-the-art results on multiple datasets
Effectively localizes partial audio forgeries with high precision
Improves detection robustness through contrastive and boundary-aware features
Abstract
Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce a novel coarse-to-fine proposal refinement framework (CFPRF) that incorporates a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization. Specifically, the FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions. The PRN is responsible for predicting confidence scores and regression offsets to refine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
