Deep Video Generation, Prediction and Completion of Human Action Sequences
Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang

TL;DR
This paper introduces a two-stage deep learning framework for human action video generation, prediction, and completion, effectively addressing the ill-posedness of these problems and outperforming existing methods.
Contribution
The paper proposes a novel two-stage approach that generates human action videos from noise or partial frames, enabling high-quality long-duration video synthesis and completion.
Findings
Outperforms state-of-the-art in video generation, prediction, and completion
Produces high-quality, longer-duration human action videos
Successfully sidesteps ill-posedness in video synthesis problems
Abstract
Current deep learning results on video generation are limited while there are only a few first results on video prediction and no relevant significant results on video completion. This is due to the severe ill-posedness inherent in these three problems. In this paper, we focus on human action videos, and propose a general, two-stage deep framework to generate human action videos with no constraints or arbitrary number of constraints, which uniformly address the three problems: video generation given no input frames, video prediction given the first few frames, and video completion given the first and last frames. To make the problem tractable, in the first stage we train a deep generative model that generates a human pose sequence from random noise. In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
