Time Does Tell: Self-Supervised Time-Tuning of Dense Image   Representations

Mohammadreza Salehi; Efstratios Gavves; Cees G. M. Snoek; Yuki M.; Asano

arXiv:2308.11796·cs.CV·August 24, 2023

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

Mohammadreza Salehi, Efstratios Gavves, Cees G. M. Snoek, Yuki M., Asano

PDF

Open Access 1 Repo

TL;DR

This paper introduces time-tuning, a self-supervised method that leverages temporal consistency in videos to enhance dense image representations, improving unsupervised segmentation performance on both videos and images.

Contribution

It proposes a novel temporal-alignment clustering loss for self-supervised learning, effectively transferring information from videos to improve image representations.

Findings

01

Improves unsupervised semantic segmentation by 8-10% on videos

02

Matches state-of-the-art performance on images

03

Leverages abundant video data for self-supervised learning

Abstract

Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consistency in dense self-supervised learning. While methods designed solely for images face difficulties in achieving even the same performance on videos, our method improves not only the representation quality for videos-but also images. Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos. This effectively facilitates the transfer of high-level information from videos to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smsd75/timetuning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques