Object-Centric Learning for Real-World Videos by Predicting Temporal   Feature Similarities

Andrii Zadaianchuk; Maximilian Seitzer; Georg Martius

arXiv:2306.04829·cs.CV·March 18, 2024·1 cites

Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Andrii Zadaianchuk, Maximilian Seitzer, Georg Martius

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel temporal feature similarity loss for unsupervised object-centric learning in videos, enabling scalable and effective discovery of objects in real-world, unconstrained video datasets.

Contribution

It proposes a new loss function based on pre-trained feature similarities, improving object discovery in videos and scaling to large, real-world datasets.

Findings

01

Achieves state-of-the-art results on synthetic MOVi datasets.

02

First object-centric video model to scale to unconstrained datasets like YouTube-VIS.

03

Demonstrates the effectiveness of temporal feature similarity loss in object discovery.

Abstract

Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martius-lab/videosaur
pytorchOfficial

Videos

Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis