Self-Supervised Decomposition, Disentanglement and Prediction of Video   Sequences while Interpreting Dynamics: A Koopman Perspective

Armand Comas; Sandesh Ghimire; Haolin Li; Mario Sznaier; Octavia Camps

arXiv:2110.00547·cs.CV·October 4, 2021·1 cites

Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective

Armand Comas, Sandesh Ghimire, Haolin Li, Mario Sznaier, Octavia Camps

PDF

Open Access

TL;DR

This paper introduces a self-supervised method to decompose videos into objects and interpret their dynamics using Koopman embeddings, enabling trajectory forecasting and dynamic manipulation.

Contribution

It presents a novel approach combining object decomposition with Koopman operator theory for dynamic interpretation in videos.

Findings

01

Successfully decomposes videos into objects and attributes

02

Forecasts challenging trajectories accurately

03

Enables interpretation and manipulation of scene dynamics

Abstract

Human interpretation of the world encompasses the use of symbols to categorize sensory inputs and compose them in a hierarchical manner. One of the long-term objectives of Computer Vision and Artificial Intelligence is to endow machines with the capacity of structuring and interpreting the world as we do. Towards this goal, recent methods have successfully been able to decompose and disentangle video sequences into their composing objects and dynamics, in a self-supervised fashion. However, there has been a scarce effort in giving interpretation to the dynamics of the scene. We propose a method to decompose a video into moving objects and their attributes, and model each object's dynamics with linear system identification tools, by means of a Koopman embedding. This allows interpretation, manipulation and extrapolation of the dynamics of the different objects by employing the Koopman…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition

MethodsTest