# Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

**Authors:** Hao Tan, Jun Lan, Zichang Tan, Ajian Liu, Chuanbiao Song, Senyuan Shi, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei

arXiv: 2508.21048 · 2026-03-02

## TL;DR

This paper introduces Veritas, a novel multi-modal large language model-based deepfake detector that employs pattern-aware reasoning and is trained on a new challenging dataset, HydraFake, to improve generalization to unseen forgeries and domains.

## Contribution

The paper presents Veritas, a deepfake detection method with pattern-aware reasoning and a two-stage training pipeline, addressing the gap between academic benchmarks and real-world scenarios.

## Key findings

- Veritas outperforms previous detectors on HydraFake's cross-model and unseen forgery scenarios.
- HydraFake provides a more realistic benchmark with diverse forgeries and domains.
- Veritas offers transparent and faithful detection outputs.

## Abstract

Deepfake detection remains a formidable challenge due to the complex and evolving nature of fake content in real-world scenarios. However, existing academic benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical deployments of current detectors. To mitigate this gap, we introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing. Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose Veritas, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce pattern-aware reasoning that involves critical reasoning patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different OOD scenarios, and is capable of delivering transparent and faithful detection outputs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21048/full.md

## Figures

55 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21048/full.md

## References

112 references — full list in the complete paper: https://tomesphere.com/paper/2508.21048/full.md

---
Source: https://tomesphere.com/paper/2508.21048