Action2video: Generating Videos of Human 3D Actions
Chuan Guo, Xinxin Zuo, Sen Wang, Xinshuang Liu, Shihao Zou, Minglun, Gong, Li Cheng

TL;DR
This paper presents a novel two-step approach for generating realistic videos of human actions from specified categories, utilizing 3D pose representations, a temporal VAE for diversity, and detailed shape extraction from images.
Contribution
It introduces a new pipeline combining 3D pose generation and detailed shape rendering, with improvements in shape extraction and dataset reannotation, advancing human motion video synthesis.
Findings
The method produces diverse and realistic human motion videos.
It outperforms existing approaches in qualitative and quantitative evaluations.
The approach effectively integrates 3D pose and shape modeling for video synthesis.
Abstract
We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories. The key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances. It is achieved in this paper by a two-step process that maintains internal 3D pose and shape representations, action2motion and motion2video. Action2motion stochastically generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos. Specifically, the Lie algebraic theory is engaged in representing natural human motions following the physical law of human kinematics; a temporal variational auto-encoder (VAE) is developed that encourages diversity of output motions. Moreover, given an additional input image of a clothed human character,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
