Semantics Meets Temporal Correspondence: Self-supervised Object-centric   Learning in Videos

Rui Qian; Shuangrui Ding; Xian Liu; Dahua Lin

arXiv:2308.09951·cs.CV·March 22, 2024·1 cites

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-supervised method that combines semantic and temporal cues to improve object-centric learning in videos, achieving state-of-the-art results in unsupervised discovery and label propagation.

Contribution

It proposes a semantic-aware masked slot attention mechanism that integrates semantic segmentation and temporal correspondence for better object instance identification.

Findings

01

Effective identification of multiple object instances with semantic structure

02

State-of-the-art performance on dense label propagation tasks

03

Promising results in unsupervised video object discovery

Abstract

Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence. Building on these results, we take one step further and explore the possibility of integrating these two features to enhance object-centric representations. Our preliminary experiments indicate that query slot attention can extract different semantic components from the RGB feature map, while random sampling based slot attention can exploit temporal correspondence cues between frames to assist instance identification. Motivated by this, we propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps. It comprises two slot attention stages with a set of shared learnable Gaussian distributions. In the first stage, we use the mean vectors as slot initialization to decompose potential semantics and generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shvdiwnkozbw/smtc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques