Weakly-supervised Audio Temporal Forgery Localization via Progressive   Audio-language Co-learning Network

Junyan Wu; Wenbo Xu; Wei Lu; Xiangyang Luo; Rui Yang; Shize Guo

arXiv:2505.01880·cs.SD·May 8, 2025

Weakly-supervised Audio Temporal Forgery Localization via Progressive Audio-language Co-learning Network

Junyan Wu, Wenbo Xu, Wei Lu, Xiangyang Luo, Rui Yang, Shize Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a weakly-supervised audio forgery localization method that uses progressive co-learning and self-supervision to accurately identify forged regions without requiring fine-grained annotations.

Contribution

The paper proposes a novel progressive audio-language co-learning network (LOCO) that leverages semantic priors and self-supervised refinement for audio forgery localization under weak supervision.

Findings

01

Achieves state-of-the-art performance on three benchmarks.

02

Effectively utilizes semantic priors for forgery detection.

03

Improves localization accuracy with progressive pseudo-label refinement.

Abstract

Audio temporal forgery localization (ATFL) aims to find the precise forgery regions of the partial spoof audio that is purposefully modified. Existing ATFL methods rely on training efficient networks using fine-grained annotations, which are obtained costly and challenging in real-world scenarios. To meet this challenge, in this paper, we propose a progressive audio-language co-learning network (LOCO) that adopts co-learning and self-supervision manners to prompt localization performance under weak supervision scenarios. Specifically, an audio-language co-learning module is first designed to capture forgery consensus features by aligning semantics from temporal and global perspectives. In this module, forgery-aware prompts are constructed by using utterance-level annotations together with learnable prompts, which can incorporate semantic priors into temporal content features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ItzJuny/LOCO
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Music and Audio Processing

MethodsContrastive Learning