VideoPCDNet: Video Parsing and Prediction with Phase Correlation Networks

Noel Jos\'e Rodrigues Vicente; Enrique Lehner; Angel Villar-Corrales; Jan Nogga; Sven Behnke

arXiv:2506.19621·cs.CV·June 25, 2025

VideoPCDNet: Video Parsing and Prediction with Phase Correlation Networks

Noel Jos\'e Rodrigues Vicente, Enrique Lehner, Angel Villar-Corrales, Jan Nogga, Sven Behnke

PDF

Open Access

TL;DR

VideoPCDNet introduces an unsupervised, object-centric framework for video parsing and prediction using frequency-domain phase correlation, enabling accurate tracking and future frame prediction with interpretable representations.

Contribution

It presents a novel frequency-domain approach for unsupervised object decomposition and motion modeling in videos, improving tracking and prediction accuracy.

Findings

01

Outperforms baseline models on synthetic datasets.

02

Learns interpretable object and motion representations.

03

Enables accurate unsupervised object tracking and prediction.

Abstract

Understanding and predicting video content is essential for planning and reasoning in dynamic environments. Despite advancements, unsupervised learning of object representations and dynamics remains challenging. We present VideoPCDNet, an unsupervised framework for object-centric video decomposition and prediction. Our model uses frequency-domain phase correlation techniques to recursively parse videos into object components, which are represented as transformed versions of learned object prototypes, enabling accurate and interpretable tracking. By explicitly modeling object motion through a combination of frequency domain operations and lightweight learned modules, VideoPCDNet enables accurate unsupervised object tracking and prediction of future video frames. In our experiments, we demonstrate that VideoPCDNet outperforms multiple object-centric baseline models for unsupervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition