Self-supervised Object-Centric Learning for Videos

G\"orkay Aydemir; Weidi Xie; Fatma G\"uney

arXiv:2310.06907·cs.CV·October 12, 2023·2 cites

Self-supervised Object-Centric Learning for Videos

G\"orkay Aydemir, Weidi Xie, Fatma G\"uney

PDF

Open Access 1 Video

TL;DR

This paper introduces a fully unsupervised object-centric learning method for segmenting multiple objects in real-world videos, leveraging temporal relations and high-level features without additional modalities.

Contribution

It presents the first unsupervised approach that spatially binds objects to slots and relates them across frames for real-world video segmentation.

Findings

01

Successfully segments multiple complex objects in YouTube videos

02

Operates without supervised labels or additional modalities

03

Uses a novel masking and merging strategy for improved performance

Abstract

Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the segmentation in video sequences. However, the performance improvements observed in synthetic sequences, which rely on the robustness of an additional cue, do not translate to more challenging real-world scenarios. In this paper, we propose the first fully unsupervised method for segmenting multiple objects in real-world sequences. Our object-centric learning framework spatially binds objects to slots on each frame and then relates these slots across frames. From these temporally-aware slots, the training objective is to reconstruct the middle frame in a high-level semantic feature space. We propose a masking strategy by dropping a significant portion of tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Self-supervised Object-Centric Learning for Videos· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods · Image Enhancement Techniques