Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh; Kailash A. Hambarde; Joana C. Costa; Hugo Proen\c{c}a; Tiago Roxo

arXiv:2604.28022·cs.CV·May 1, 2026

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa, Hugo Proen\c{c}a, Tiago Roxo

PDF

1 Repo

TL;DR

This paper introduces a new evaluation framework for DeepFake detection that emphasizes semantic inconsistencies between audio and video, revealing limitations of current models and proposing enhancements for more realistic detection.

Contribution

It extends existing four-class formulations by modeling semantic mismatches, introduces variants exposing architectural vulnerabilities, and proposes a semantic reinforcement strategy with ImageBind embeddings.

Findings

01

State-of-the-art models struggle with semantic mismatch data.

02

Three RARV-SMM variants reveal architectural vulnerabilities.

03

Semantic reinforcement improves detection performance.

Abstract

Current DeepFake detection scenarios are mostly binary, yet data manipulation can vary across audio, video, or both, whose variability is not captured in binary settings. Four-class audio-visual formulations address this by discriminating manipulation type, but introduce a unresolved problem: models may rely solely on data source integrity to detect DeepFakes without evaluating their semantic consistency. If the DeepFake origin is not in the data source but in its content, can semantic mismatch be assessed by the state-of-the-art? This paper proposes a new evaluation setup, extending the four-class formulation by explicitly modeling semantic-level inconsistency between authentic modalities with the introduction a new class: Real Audio-Real Video with Semantic Mismatch (RARV-SMM). We assess the robustness of state-of-the-art models in this new realistic DeepFake setting, using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.