# Decoding deception: state-of-the-art approaches to deep fake detection

**Authors:** Tarak Hussain, B. Tirapathi Reddy, Kondaveti Phanindra, Sailaja Terumalasetti, Ghufran Ahmad Khan

PMC · DOI: 10.3389/fdata.2025.1670833 · Frontiers in Big Data · 2026-01-09

## TL;DR

This paper introduces a new framework for detecting deepfakes by analyzing audio-visual inconsistencies, achieving high accuracy and robustness.

## Contribution

The novel contribution is a cross-modal detection framework with a self-supervised pre-training strategy requiring less labeled data.

## Key findings

- The framework achieves 98.76% accuracy on benchmark datasets with 93,750 test samples.
- It shows a 17.85% generalization advantage over unimodal methods.
- Synchronized audio-visual inconsistencies are highly discriminative (Cohen's d = 1.87).

## Abstract

Deepfake technology evolves at an alarming pace, threatening information integrity and social trust. We present new multimodal deepfake detection framework exploiting cross-domain inconsistencies, utilizing audio-visual consistency. Its core is the Synchronization-Aware Feature Fusion (SAFF) architecture combined with Cross-Modal Graph Attention Networks (CM-GAN), both addressing the temporal misalignments explicitly for improved detection accuracy. Across eight models and five benchmark datasets with 93,750 test samples, the framework obtains 98.76% accuracy and significant robustness against multiple compression levels. Synchronized audio-visual inconsistencies are thus highly discriminative according to statistical analysis (Cohen's d = 1.87). With contributions centering around a cross-modal feature extraction pipeline, a graph-based attention mechanism for inter-modal reasoning and an extensive number of ablation studies validating the fusion strategy, the paper also provides statistically sound insights to guide future pursuit in this area. With a 17.85% generalization advantage over unimodal methods, the framework represents a new state of the art and introduces a self-supervised pre-training strategy that leverages labeled data 65% less.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12827133/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12827133/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12827133/full.md

---
Source: https://tomesphere.com/paper/PMC12827133