Punching Bag vs. Punching Person: Motion Transferability in Videos

Raiyaan Abdullah; Jared Claypoole; Michael Cogswell; Ajay Divakaran; Yogesh Rawat

arXiv:2508.00085·cs.CV·August 4, 2025

Punching Bag vs. Punching Person: Motion Transferability in Videos

Raiyaan Abdullah, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Rawat

PDF

Open Access 1 Datasets

TL;DR

This paper investigates how well action recognition models transfer high-level motion understanding across different contexts, introducing new datasets and benchmarks to evaluate their generalization capabilities and analyzing factors affecting transferability.

Contribution

The study introduces a motion transferability framework with new synthetic and adapted datasets, providing a benchmark for evaluating and understanding motion transfer in action recognition models.

Findings

01

Multimodal models struggle more with unknown fine-grained actions.

02

Synthetic dataset challenges models as real-world datasets do.

03

Larger models excel with spatial cues but not with temporal reasoning.

Abstract

Action recognition models demonstrate strong generalization, but can they effectively transfer high-level motion concepts across diverse contexts, even within similar distributions? For example, can a model recognize the broad action "punching" when presented with an unseen variation such as "punching person"? To explore this, we introduce a motion transferability framework with three datasets: (1) Syn-TA, a synthetic dataset with 3D object motions; (2) Kinetics400-TA; and (3) Something-Something-v2-TA, both adapted from natural video datasets. We evaluate 13 state-of-the-art models on these benchmarks and observe a significant drop in performance when recognizing high-level actions in novel contexts. Our analysis reveals: 1) Multimodal models struggle more with fine-grained unknown actions than with coarse ones; 2) The bias-free Syn-TA proves as challenging as real-world datasets, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

raiyaanabdullah/Syn-TA
dataset· 89 dl
89 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI