Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Jipeng Liu; Haichao Shi; Siyu Xing; Rong Yin; and Xiao-Yu Zhang

arXiv:2603.24057·cs.CV·March 26, 2026

Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Jipeng Liu, Haichao Shi, Siyu Xing, Rong Yin, and Xiao-Yu Zhang

PDF

Open Access

TL;DR

This paper identifies a failure mode in deepfake detectors trained with SAM, introduces theoretical tools to analyze it, and proposes CoRIT, a new model that mitigates this issue and improves generalization in forgery detection.

Contribution

The work formalizes Optimization Collapse in deepfake detection, links it to intrinsic generalization limits, and introduces CoRIT, a novel model with strategies to enhance robustness and generalization.

Findings

01

CoRIT outperforms existing methods on cross-domain benchmarks.

02

Optimization Collapse is linked to layer-wise GSNR attenuation.

03

Theoretical analysis connects COR, GSNR, and stability in SAM training.

Abstract

While Vision-Language Models (VLMs) like CLIP have emerged as a dominant paradigm for generalizable deepfake detection, a representational disconnect remains: their semantic-centric pre-training is ill-suited for capturing non-semantic artifacts inherent to hyper-realistic synthesis. In this work, we identify a failure mode termed Optimization Collapse, where detectors trained with Sharpness-Aware Minimization (SAM) degenerate to random guessing on non-semantic forgeries once the perturbation radius exceeds a narrow threshold. To theoretically formalize this collapse, we propose the Critical Optimization Radius (COR) to quantify the geometric stability of the optimization landscape, and leverage the Gradient Signal-to-Noise Ratio (GSNR) to measure generalization potential. We establish a theorem proving that COR increases monotonically with GSNR, thereby revealing that the geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Domain Adaptation and Few-Shot Learning