Simple Unsupervised Object-Centric Learning for Complex and Naturalistic   Videos

Gautam Singh; Yi-Fu Wu; Sungjin Ahn

arXiv:2205.14065·cs.CV·May 30, 2022·20 cites

Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos

Gautam Singh, Yi-Fu Wu, Sungjin Ahn

PDF

Open Access 1 Repo 1 Video

TL;DR

STEVE is a simple yet effective unsupervised object-centric learning model for complex naturalistic videos, achieving significant improvements without added complexity or supervision.

Contribution

It introduces a straightforward transformer-based architecture for object-centric learning in videos, capable of handling complex scenes without additional supervision.

Findings

01

Outperforms previous methods on complex naturalistic videos

02

Uses a simple architecture without extra supervision

03

Achieves significant improvements in object-centric learning

Abstract

Unsupervised object-centric learning aims to represent the modular, compositional, and causal structure of a scene as a set of object representations and thereby promises to resolve many critical limitations of traditional single-vector representations such as poor systematic generalization. Although there have been many remarkable advances in recent years, one of the most critical problems in this direction has been that previous methods work only with simple and synthetic scenes but not with complex and naturalistic images or videos. In this paper, we propose STEVE, an unsupervised model for object-centric learning in videos. Our proposed model makes a significant advancement by demonstrating its effectiveness on various complex and naturalistic videos unprecedented in this line of research. Interestingly, this is achieved by neither adding complexity to the model architecture nor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

singhgautam/steve
pytorch

Videos

Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging