Local Frequency Domain Transformer Networks for Video Prediction

Hafez Farazi; Jan Nogga; Sven Behnke

arXiv:2105.04637·cs.CV·May 12, 2021

Local Frequency Domain Transformer Networks for Video Prediction

Hafez Farazi, Jan Nogga, Sven Behnke

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel, fully differentiable building block for video prediction that disentangles transformation, projection, and transformation tasks, enhancing interpretability and extending to motion segmentation.

Contribution

It proposes a new interpretable, differentiable module for video prediction that separates key tasks and can be extended for scene understanding and motion segmentation.

Findings

01

Effective on synthetic and real data

02

Enables motion segmentation and scene composition understanding

03

Produces reliable, interpretable predictions

Abstract

Video prediction is commonly referred to as forecasting future frames of a video sequence provided several past frames thereof. It remains a challenging domain as visual scenes evolve according to complex underlying dynamics, such as the camera's egocentric motion or the distinct motility per individual object viewed. These are mostly hidden from the observer and manifest as often highly non-linear transformations between consecutive video frames. Therefore, video prediction is of interest not only in anticipating visual changes in the real world but has, above all, emerged as an unsupervised learning rule targeting the formation and dynamics of the observed environment. Many of the deep learning-based state-of-the-art models for video prediction utilize some form of recurrent layers like Long Short-Term Memory (LSTMs) or Gated Recurrent Units (GRUs) at the core of their models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AIS-Bonn/Local_Freq_Transformer_Net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging