LARNet: Latent Action Representation for Human Action Synthesis
Naman Biyani, Aayush J Rana, Shruti Vyas, Yogesh S Rawat

TL;DR
LARNet is an end-to-end model that synthesizes human action videos by learning action dynamics in latent space, eliminating the need for driving videos, and employs a hierarchical recurrent structure with a novel loss for improved temporal coherence.
Contribution
It introduces a generative approach for action dynamics in latent space and a hierarchical recurrent structure with a mix-adversarial loss for video synthesis.
Findings
Effective in generating realistic human action videos
Outperforms existing methods on multiple datasets
Improves temporal coherence in synthesized videos
Abstract
We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Human Motion and Animation
