The Role of Video Generation in Enhancing Data-Limited Action Understanding

Wei Li; Dezhao Luo; Dongbao Yang; Zhenhang Li; Weiping Wang; Yu Zhou

arXiv:2505.19495·cs.CV·October 13, 2025

The Role of Video Generation in Enhancing Data-Limited Action Understanding

Wei Li, Dezhao Luo, Dongbao Yang, Zhenhang Li, Weiping Wang, Yu Zhou

PDF

Open Access

TL;DR

This paper introduces a text-to-video diffusion transformer to generate annotated data, significantly improving data-limited action understanding and achieving state-of-the-art zero-shot action recognition performance.

Contribution

The paper presents a novel data augmentation method using text-to-video diffusion transformers, along with strategies to enhance sample informativeness and mitigate low-quality data effects.

Findings

01

Achieved state-of-the-art zero-shot action recognition results.

02

Generated realistic annotated video data on an infinite scale.

03

Enhanced training with information and uncertainty-based strategies.

Abstract

Video action understanding tasks in real-world scenarios always suffer data limitations. In this paper, we address the data-limited action understanding problem by bridging data scarcity. We propose a novel method that employs a text-to-video diffusion transformer to generate annotated data for model training. This paradigm enables the generation of realistic annotated data on an infinite scale without human intervention. We proposed the information enhancement strategy and the uncertainty-based label smoothing tailored to generate sample training. Through quantitative and qualitative analysis, we observed that real samples generally contain a richer level of information than generated samples. Based on this observation, the information enhancement strategy is proposed to enhance the informative content of the generated samples from two aspects: the environments and the characters.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAction Observation and Synchronization · Sport Psychology and Performance

MethodsDiffusion · Label Smoothing