3-Tracer: A Tri-level Temporal-Aware Framework for Audio Forgery Detection and Localization
Shuhan Xia, Xuannan Liu, Xing Cui, Peipei Li

TL;DR
The paper introduces T3-Tracer, a hierarchical framework that jointly analyzes audio at multiple temporal levels to improve detection and localization of partial audio forgeries, addressing limitations of previous frame-only methods.
Contribution
It presents the first joint multi-level analysis framework for audio forgery detection, combining frame, segment, and audio levels with novel modules for comprehensive forgery trace identification.
Findings
Achieves state-of-the-art performance on three datasets.
Effectively detects both intra-frame and boundary forgeries.
Outperforms existing methods in accuracy and robustness.
Abstract
Recently, partial audio forgery has emerged as a new form of audio manipulation. Attackers selectively modify partial but semantically critical frames while preserving the overall perceptual authenticity, making such forgeries particularly difficult to detect. Existing methods focus on independently detecting whether a single frame is forged, lacking the hierarchical structure to capture both transient and sustained anomalies across different temporal levels. To address these limitations, We identify three key levels relevant to partial audio forgery detection and present T3-Tracer, the first framework that jointly analyzes audio at the frame, segment, and audio levels to comprehensively detect forgery traces. T3-Tracer consists of two complementary core modules: the Frame-Audio Feature Aggregation Module (FA-FAM) and the Segment-level Multi-Scale Discrepancy-Aware Module (SMDAM).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
