HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection
Jialei Cui, Jianwei Du, Yanzhe Li, Lei Gao, Hui Jiang, Chenfu Bao

TL;DR
HAMLET-FFD is a hierarchical, multi-modal framework inspired by cognition that improves face forgery detection by integrating visual and textual cues through a bidirectional fusion mechanism, enhancing cross-domain generalization.
Contribution
It introduces a novel hierarchical multi-modal learning approach with a knowledge refinement loop and bidirectional fusion, leveraging CLIP without fine-tuning for improved forgery detection.
Findings
Outperforms existing methods on unseen manipulations
Enhances cross-domain generalization in face forgery detection
Reveals specialized embeddings for artifact recognition
Abstract
The rapid evolution of face manipulation techniques poses a critical challenge for face forgery detection: cross-domain generalization. Conventional methods, which rely on simple classification objectives, often fail to learn domain-invariant representations. We propose HAMLET-FFD, a cognitively inspired Hierarchical Adaptive Multi-modal Learning framework that tackles this challenge via bidirectional cross-modal reasoning. Building on contrastive vision-language models such as CLIP, HAMLET-FFD introduces a knowledge refinement loop that iteratively assesses authenticity by integrating visual evidence with conceptual cues, emulating expert forensic analysis. A key innovation is a bidirectional fusion mechanism in which textual authenticity embeddings guide the aggregation of hierarchical visual features, while modulated visual features refine text embeddings to generate image-adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
