VideoPCDNet: Video Parsing and Prediction with Phase Correlation Networks
Noel Jos\'e Rodrigues Vicente, Enrique Lehner, Angel Villar-Corrales, Jan Nogga, Sven Behnke

TL;DR
VideoPCDNet introduces an unsupervised, object-centric framework for video parsing and prediction using frequency-domain phase correlation, enabling accurate tracking and future frame prediction with interpretable representations.
Contribution
It presents a novel frequency-domain approach for unsupervised object decomposition and motion modeling in videos, improving tracking and prediction accuracy.
Findings
Outperforms baseline models on synthetic datasets.
Learns interpretable object and motion representations.
Enables accurate unsupervised object tracking and prediction.
Abstract
Understanding and predicting video content is essential for planning and reasoning in dynamic environments. Despite advancements, unsupervised learning of object representations and dynamics remains challenging. We present VideoPCDNet, an unsupervised framework for object-centric video decomposition and prediction. Our model uses frequency-domain phase correlation techniques to recursively parse videos into object components, which are represented as transformed versions of learned object prototypes, enabling accurate and interpretable tracking. By explicitly modeling object motion through a combination of frequency domain operations and lightweight learned modules, VideoPCDNet enables accurate unsupervised object tracking and prediction of future video frames. In our experiments, we demonstrate that VideoPCDNet outperforms multiple object-centric baseline models for unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
