ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models
Yingxin Lai, Zitong Yu, Jun Wang, Linlin Shen, Yong Xu, and Xiaochun Cao

TL;DR
ForensicZip introduces a forgery-driven token compression method for multimedia forensic models, significantly reducing computational costs while maintaining high detection accuracy by focusing on physical discontinuities and anomalies.
Contribution
It presents a novel, training-free token pruning framework based on optimal transport that enhances forensic detection efficiency without sacrificing performance.
Findings
Achieves 2.97x speedup at 10% token retention.
Reduces over 90% FLOPs with maintained detection accuracy.
Outperforms existing methods on deepfake and AIGC benchmarks.
Abstract
Multimodal Large Language Models (MLLMs) enable interpretable multimedia forensics by generating textual rationales for forgery detection. However, processing dense visual sequences incurs high computational costs, particularly for high-resolution images and videos. Visual token pruning is a practical acceleration strategy, yet existing methods are largely semantic-driven, retaining salient objects while discarding background regions where manipulation traces such as high-frequency anomalies and temporal jitters often reside. To address this issue, we introduce ForensicZip, a training-free framework that reformulates token compression from a forgery-driven perspective. ForensicZip models temporal token evolution as a Birth-Death Optimal Transport problem with a slack dummy node, quantifying physical discontinuities indicating transient generative artifacts. The forensic scoring further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
