Self-Supervision by Prediction for Object Discovery in Videos

Beril Besbinar; Pascal Frossard

arXiv:2103.05669·cs.CV·March 11, 2021

Self-Supervision by Prediction for Object Discovery in Videos

Beril Besbinar, Pascal Frossard

PDF

TL;DR

This paper introduces a self-supervised, object-centric model for video prediction that disentangles objects and motion, handles occlusion, and does not require manual annotations, advancing unsupervised learning in videos.

Contribution

It presents a novel self-supervised framework for object discovery and prediction in videos, explicitly modeling occlusion and background in an unsupervised manner.

Findings

01

Effective disentanglement of objects and motion dynamics

02

Handles occlusion and inpaints inferred objects

03

Promising results in object-centric video prediction

Abstract

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios. One scalable solution is to make the model generate the supervision for itself by leveraging some part of the input data, which is known as self-supervised learning. In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation. In addition to disentangling the notion of objects and the motion dynamics, our compositional structure explicitly handles occlusion and inpaints inferred objects and background for the composition of the predicted frame. With the aid of auxiliary loss functions that promote spatially and temporally consistent object representations, our self-supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.