Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective
Armand Comas, Sandesh Ghimire, Haolin Li, Mario Sznaier, Octavia Camps

TL;DR
This paper introduces a self-supervised method to decompose videos into objects and interpret their dynamics using Koopman embeddings, enabling trajectory forecasting and dynamic manipulation.
Contribution
It presents a novel approach combining object decomposition with Koopman operator theory for dynamic interpretation in videos.
Findings
Successfully decomposes videos into objects and attributes
Forecasts challenging trajectories accurately
Enables interpretation and manipulation of scene dynamics
Abstract
Human interpretation of the world encompasses the use of symbols to categorize sensory inputs and compose them in a hierarchical manner. One of the long-term objectives of Computer Vision and Artificial Intelligence is to endow machines with the capacity of structuring and interpreting the world as we do. Towards this goal, recent methods have successfully been able to decompose and disentangle video sequences into their composing objects and dynamics, in a self-supervised fashion. However, there has been a scarce effort in giving interpretation to the dynamics of the scene. We propose a method to decompose a video into moving objects and their attributes, and model each object's dynamics with linear system identification tools, by means of a Koopman embedding. This allows interpretation, manipulation and extrapolation of the dynamics of the different objects by employing the Koopman…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsTest
