Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection

Chen Chen; Dion Hoe-Lian Goh

arXiv:2602.01284·cs.MM·February 3, 2026

Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection

Chen Chen, Dion Hoe-Lian Goh

PDF

Open Access

TL;DR

This study investigates how humans detect deepfake videos by analyzing visual, audio, and knowledge cues, providing insights to improve media literacy and detection strategies.

Contribution

It identifies key multimodal cues and their combinations that influence human detection accuracy, informing the design of more effective media literacy interventions.

Findings

01

Participants were more accurate with real videos than deepfakes.

02

Visual appearance, vocal cues, and intuition often co-occur in successful detection.

03

Certain cue combinations significantly influence detection performance.

Abstract

As deepfake videos become increasingly difficult for people to recognise, understanding the strategies humans use is key to designing effective media literacy interventions. We conducted a study with 195 participants between the ages of 21 and 40, who judged real and deepfake videos, rated their confidence, and reported the cues they relied on across visual, audio, and knowledge strategies. Participants were more accurate with real videos than with deepfakes and showed lower expected calibration error for real content. Through association rule mining, we identified cue combinations that shaped performance. Visual appearance, vocal, and intuition often co-occurred for successful identifications, which highlights the importance of multimodal approaches in human detection. Our findings show which cues help or hinder detection and suggest directions for designing media literacy tools that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Child Development and Digital Technology · Misinformation and Its Impacts