Time-Contrastive Pretraining for In-Context Image and Video Segmentation

Assefa Wahd; Jacob Jaremko; Abhilash Hareendranathan

arXiv:2506.17837·cs.CV·June 24, 2025

Time-Contrastive Pretraining for In-Context Image and Video Segmentation

Assefa Wahd, Jacob Jaremko, Abhilash Hareendranathan

PDF

TL;DR

This paper introduces Temporal, a self-supervised pretraining method that enhances in-context image and video segmentation by framing it as a video object segmentation task, improving flexibility and performance.

Contribution

The paper proposes a novel time-contrastive pretraining approach that reformulates in-context learning as a video object segmentation problem, enabling variable context image resolution and quantity.

Findings

01

Achieves 90.95% Dice score in image segmentation

02

Attains 92.45% Dice score in video segmentation

03

Significantly outperforms baseline methods on MICCAI FLARE 2022

Abstract

In-context learning (ICL) enables generalization to new tasks with minimal labeled data. However, mainstream ICL approaches rely on a gridding strategy, which lacks the flexibility required for vision applications. We introduce Temporal, a time-contrastive self-supervised objective that pretrains a prompt retriever for visual ICL, and formulate ICL as a video object segmentation (VOS) task. Temporal addresses key limitations of grid-based methods that restrict the number and resolution of context images. By reframing ICL as a VOS problem, our approach supports a variable number of context images while preserving their full resolution. To address the challenge of selecting optimal context sets for queries, we pretrain a prompt retriever on videos via self-supervised learning, where adjacent frames serve as positives and distant frames as negatives. For image segmentation, the prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.