HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu; Yunfan Ye; Fan Zhang; Qingyang Zhou; Yuchuan Luo; Zhiping Cai

arXiv:2507.19924·cs.CV·August 4, 2025

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu, Yunfan Ye, Fan Zhang, Qingyang Zhou, Yuchuan Luo, Zhiping Cai

PDF

TL;DR

HumanSAM is a novel framework that classifies human-centric forgery videos into spatial, appearance, and motion anomalies, improving interpretability and robustness in forgery detection.

Contribution

The paper introduces HumanSAM, the first benchmark dataset for human-centric forgery videos, and proposes a new classification framework with a fusion of video understanding and depth features.

Findings

01

HumanSAM outperforms state-of-the-art methods in classification accuracy.

02

The HFV dataset provides comprehensive annotations for forgery types.

03

The rank-based confidence strategy enhances model robustness.

Abstract

Numerous synthesized videos from generative models, especially human-centric ones that simulate realistic human actions, pose significant threats to human information security and authenticity. While progress has been made in binary forgery video detection, the lack of fine-grained understanding of forgery types raises concerns regarding both reliability and interpretability, which are critical for real-world applications. To address this limitation, we propose HumanSAM, a new framework that builds upon the fundamental challenges of video generation models. Specifically, HumanSAM aims to classify human-centric forgeries into three distinct types of artifacts commonly observed in generated content: spatial, appearance, and motion anomaly. To better capture the features of geometry, semantics and spatiotemporal consistency, we propose to generate the human forgery representation by fusing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.