Synthetic Human Action Video Data Generation with Pose Transfer
Vaclav Knapp, Matyas Bohacek

TL;DR
This paper introduces a pose transfer-based method for generating realistic synthetic human action videos, enhancing training data diversity and improving action recognition performance.
Contribution
It presents a novel pose transfer technique using controllable 3D Gaussian avatars and releases a new dataset with diverse human identities for research.
Findings
Improves action recognition accuracy on benchmark datasets.
Effectively scales few-shot datasets with diverse backgrounds.
Enhances data diversity for training models.
Abstract
In video understanding tasks, particularly those involving human motion, synthetic data generation often suffers from uncanny features, diminishing its effectiveness for training. Tasks such as sign language translation, gesture recognition, and human motion understanding in autonomous driving have thus been unable to exploit the full potential of synthetic data. This paper proposes a method for generating synthetic human action video data using pose transfer (specifically, controllable 3D Gaussian avatar models). We evaluate this method on the Toyota Smarthome and NTU RGB+D datasets and show that it improves performance in action recognition tasks. Moreover, we demonstrate that the method can effectively scale few-shot datasets, making up for groups underrepresented in the real training data and adding diverse backgrounds. We open-source the method along with RANDOM People, a dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Social Robot Interaction and HRI
