LAVA: Layered Audio-Visual Anti-tampering Watermarking for Robust Deepfake Detection and Localization
Bokang Zeng, Zheng Gao, Xiaoyu Li, Xiaoyan Feng, Jiaojiao Jiang

TL;DR
LAVA is a novel layered audio-visual watermarking framework that enhances deepfake tamper detection and localization robustness against compression and misalignment in short videos.
Contribution
It introduces a calibration-aware fusion approach that maintains reliable tamper evidence under real-world degradations, surpassing existing methods.
Findings
Achieves near-perfect detection performance (AP = 0.999)
Remains robust to compression and multimodal misalignment
Improves tamper localization reliability over existing baselines
Abstract
Proactive watermarking offers a promising approach for deepfake tamper detection and localization in short-form videos. However, existing methods often decouple audio and visual evidence and assume that watermark signals remain reliable under real-world degradations, making tamper localization vulnerable to multimodal misalignment and compression distortions. Moreover, existing semi-fragile visual watermarking methods often degrade significantly under codec compression because their embedding bands overlap with compression-sensitive frequency regions. To address these limitations, we propose Layered Audio-Visual Anti-tampering Watermarking (LAVA), a calibration-aware audio-visual watermark fusion framework for deepfake tamper detection and localization. LAVA leverages cross-modal watermark fusion and calibration-aware alignment to preserve consistent and reliable tamper evidence under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
