UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection

Ching-Yi Lai; Chih-Yu Jian; Pei-Cheng Chuang; Chia-Ming Lee; Chih-Chung Hsu; Chiou-Ting Hsu; Chia-Wen Lin

arXiv:2511.18983·cs.CV·November 25, 2025

UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection

Ching-Yi Lai, Chih-Yu Jian, Pei-Cheng Chuang, Chia-Ming Lee, Chih-Chung Hsu, Chiou-Ting Hsu, Chia-Wen Lin

PDF

Open Access

TL;DR

This paper introduces UMCL, a novel contrastive learning framework that transforms a single visual modality into multiple robust features for improved deepfake detection across various compression rates, addressing real-world challenges.

Contribution

The paper proposes a new multimodal contrastive learning approach that generates multiple features from a single modality and aligns them explicitly, enhancing robustness and interpretability in deepfake detection.

Findings

01

Achieves superior detection performance across different compression rates.

02

Maintains high accuracy even when individual features degrade.

03

Provides interpretable insights into feature relationships through explicit alignment.

Abstract

In deepfake detection, the varying degrees of compression employed by social media platforms pose significant challenges for model generalization and reliability. Although existing methods have progressed from single-modal to multimodal approaches, they face critical limitations: single-modal methods struggle with feature degradation under data compression in social media streaming, while multimodal approaches require expensive data collection and labeling and suffer from inconsistent modal quality or accessibility in real-world scenarios. To address these challenges, we propose a novel Unimodal-generated Multimodal Contrastive Learning (UMCL) framework for robust cross-compression-rate (CCR) deepfake detection. In the training stage, our approach transforms a single visual modality into three complementary features: compression-robust rPPG signals, temporal landmark dynamics, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Image and Video Quality Assessment