Object-centric Video Prediction without Annotation

Karl Schmeckpeper; Georgios Georgakis; Kostas Daniilidis

arXiv:2105.02799·cs.CV·May 7, 2021

Object-centric Video Prediction without Annotation

Karl Schmeckpeper, Georgios Georgakis, Kostas Daniilidis

PDF

1 Repo

TL;DR

This paper introduces OPA, an object-centric video prediction method that learns without dense annotations by leveraging pre-trained vision models, enabling better understanding and control of dynamic scenes.

Contribution

OPA is the first object-centric video prediction approach that operates without requiring dense object annotations during training.

Findings

01

Successfully predicts object dynamics in falling object videos.

02

Adapts perception models through end-to-end training.

03

Demonstrates improved scene understanding without manual annotations.

Abstract

In order to interact with the world, agents must be able to predict the results of the world's dynamics. A natural approach to learn about these dynamics is through video prediction, as cameras are ubiquitous and powerful sensors. Direct pixel-to-pixel video prediction is difficult, does not take advantage of known priors, and does not provide an easy interface to utilize the learned dynamics. Object-centric video prediction offers a solution to these problems by taking advantage of the simple prior that the world is made of objects and by providing a more natural interface for control. However, existing object-centric video prediction pipelines require dense object annotations in training video sequences. In this work, we present Object-centric Prediction without Annotation (OPA), an object-centric video prediction method that takes advantage of priors from powerful computer vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kschmeckpeper/opa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.