Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Sihyun Yu, Jihoon Tack, Sangwoo Mo, Hyunsu Kim, Junho Kim, Jung-Woo, Ha, Jinwoo Shin

TL;DR
This paper introduces DIGAN, a novel video generation framework using implicit neural representations that effectively models continuous dynamics, enabling high-quality, long, and extrapolated videos with improved metrics.
Contribution
The paper proposes a dynamics-aware INR-based GAN for video synthesis, addressing limitations of prior grid-based methods and enhancing motion realism and video length.
Findings
Improves FVD score on UCF-101 by 30.7%
Generates longer videos (128 frames) than previous methods
Enables video extrapolation and non-autoregressive generation
Abstract
In the deep learning era, long video generation of high-quality still remains challenging due to the spatio-temporal complexity and continuity of videos. Existing prior works have attempted to model video distribution by representing videos as 3D grids of RGB values, which impedes the scale of generated videos and neglects continuous dynamics. In this paper, we found that the recent emerging paradigm of implicit neural representations (INRs) that encodes a continuous signal into a parameterized neural network effectively mitigates the issue. By utilizing INRs of video, we propose dynamics-aware implicit generative adversarial network (DIGAN), a novel generative adversarial network for video generation. Specifically, we introduce (a) an INR-based video generator that improves the motion dynamics by manipulating the space and time coordinates differently and (b) a motion discriminator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Human Pose and Action Recognition
