TL;DR
This paper explores the use of synthetic humans generated from monocular 3D reconstructions to enhance RGB human action recognition, especially for unseen viewpoints, achieving state-of-the-art results on multiple benchmarks.
Contribution
It introduces SURREACT, a new data generation method, and systematically investigates augmentation strategies to improve multi-view action recognition performance.
Findings
Significant performance improvements on NTU RGB+D and UESTC benchmarks.
Effective augmentation strategies for unseen viewpoints.
Enhanced recognition even with limited in-the-wild data.
Abstract
Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored. Our goal in this work is to answer the question whether synthetic humans can improve the performance of human action recognition, with a particular focus on generalization to unseen viewpoints. We make use of the recent advances in monocular 3D human body reconstruction from real action sequences to automatically render synthetic training videos for the action labels. We make the following contributions: (i) we investigate the extent of variations and augmentations that are beneficial to improving performance at new viewpoints. We consider changes in body shape and clothing for individuals, as well as more action relevant augmentations such as non-uniform frame sampling, and interpolating between the motion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
