Fourier-based Video Prediction through Relational Object Motion
Malte Mosbach, Sven Behnke

TL;DR
This paper introduces a novel frequency-domain method for video prediction that explicitly models object-motion relationships, resulting in clearer, more consistent predictions without extensive training.
Contribution
It proposes a Fourier-based approach combined with relational object motion inference, offering an alternative to recurrent neural networks for video prediction.
Findings
Predictions are consistent with observed scene dynamics.
Results are less blurry compared to traditional deep recurrent models.
Method reduces training requirements.
Abstract
The ability to predict future outcomes conditioned on observed video frames is crucial for intelligent decision-making in autonomous systems. Recently, deep recurrent architectures have been applied to the task of video prediction. However, this often results in blurry predictions and requires tedious training on large datasets. Here, we explore a different approach by (1) using frequency-domain approaches for video prediction and (2) explicitly inferring object-motion relationships in the observed scene. The resulting predictions are consistent with the observed dynamics in a scene and do not suffer from blur.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
