Learning Energy-based Spatial-Temporal Generative ConvNets for Dynamic Patterns
Jianwen Xie, Song-Chun Zhu, Ying Nian Wu

TL;DR
This paper introduces an energy-based spatial-temporal generative ConvNet for modeling and synthesizing dynamic video patterns, capable of learning from incomplete data and capturing complex spatial-temporal structures.
Contribution
It proposes a novel energy-based ConvNet model for dynamic pattern synthesis and demonstrates effective learning from incomplete video sequences.
Findings
Synthesizes realistic dynamic video patterns.
Learns from incomplete sequences with occlusions or missing frames.
Simultaneous pattern learning and completion.
Abstract
Video sequences contain rich dynamic patterns, such as dynamic texture patterns that exhibit stationarity in the temporal domain, and action patterns that are non-stationary in either spatial or temporal domain. We show that an energy-based spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns. The model defines a probability distribution on the video sequence, and the log probability is defined by a spatial-temporal ConvNet that consists of multiple layers of spatial-temporal filters to capture spatial-temporal patterns of different scales. The model can be learned from the training video sequences by an "analysis by synthesis" learning algorithm that iterates the following two steps. Step 1 synthesizes video sequences from the currently learned model. Step 2 then updates the model parameters based on the difference between the synthesized video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Computer Graphics and Visualization Techniques
