FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Zhou Liu; Tonghua Su; Hongshi Zhang; Fuxiang Yang; Donglin Di; Yang Song; Lei Fan

arXiv:2602.18880·cs.CV·February 24, 2026

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Zhou Liu, Tonghua Su, Hongshi Zhang, Fuxiang Yang, Donglin Di, Yang Song, Lei Fan

PDF

Open Access

TL;DR

FOCA is a multimodal framework that improves image forgery detection, localization, and explanation by integrating spatial and frequency domain features with a large language model, supported by a new dataset.

Contribution

The paper introduces FOCA, a novel multimodal large language model-based framework for forgery detection that combines spatial and frequency features with interpretability, along with a new dataset FSE-Set.

Findings

01

FOCA outperforms existing methods in detection accuracy.

02

FOCA provides human-interpretable explanations of forgeries.

03

Extensive experiments validate FOCA's effectiveness across domains.

Abstract

Advances in image tampering techniques, particularly generative models, pose significant challenges to media verification, digital forensics, and public trust. Existing image forgery detection and localization (IFDL) methods suffer from two key limitations: over-reliance on semantic content while neglecting textural cues, and limited interpretability of subtle low-level tampering traces. To address these issues, we propose FOCA, a multimodal large language model-based framework that integrates discriminative features from both the RGB spatial and frequency domains via a cross-attention fusion module. This design enables accurate forgery detection and localization while providing explicit, human-interpretable cross-domain explanations. We further introduce FSE-Set, a large-scale dataset with diverse authentic and tampered images, pixel-level masks, and dual-domain annotations. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning