Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

Rohit Kundu; Hao Xiong; Vishal Mohanty; Athula Balachandran; Amit K. Roy-Chowdhury

arXiv:2412.12278·cs.CV·September 4, 2025

Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury

PDF

Open Access

TL;DR

This paper introduces UNITE, a versatile transformer-based model that detects a wide range of synthetic videos, including fully AI-generated content and background manipulations, surpassing face-centric detection methods.

Contribution

The paper presents UNITE, a novel full-frame video tampering detector that extends beyond faces to include backgrounds and fully synthetic videos, using domain-agnostic features and attention-diversity loss.

Findings

01

UNITE outperforms existing detectors in cross-data evaluations.

02

It effectively detects fully synthetic T2V/I2V videos.

03

The attention-diversity loss enhances detection across diverse scenarios.

Abstract

Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches. To address this, we introduce the \underline{U}niversal \underline{N}etwork for \underline{I}dentifying \underline{T}ampered and synth\underline{E}tic videos (\texttt{UNITE}) model, which, unlike traditional detectors, captures full-frame manipulations. \texttt{UNITE} extends detection capabilities to scenarios without faces, non-human subjects, and complex background modifications. It leverages a transformer-based architecture that processes domain-agnostic features extracted from videos via the SigLIP-So400M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Digital Media Forensic Detection

MethodsSoftmax · Attention Is All You Need · Focus