From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning

Yang Liu; Qianqian Xu; Peisong Wen; Siran Dai; Xilin Zhao; Qingming Huang

arXiv:2603.26597·cs.CV·March 30, 2026

From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning

Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Xilin Zhao, Qingming Huang

PDF

1 Repo

TL;DR

This paper introduces Co-Settle, a lightweight transfer learning framework that balances intra-video temporal consistency and inter-video semantic separability for improved video representation learning from image models.

Contribution

It proposes a novel lightweight projection layer with a cycle consistency and separability constraint, enabling effective self-supervised transfer from images to videos.

Findings

01

Consistent improvements across multiple video tasks.

02

Achieves effective transfer with only five epochs of training.

03

Theoretical support for the trade-off optimization.

Abstract

Recent studies have made notable progress in video representation learning by transferring image-pretrained models to video tasks, typically with complex temporal modules and video fine-tuning. However, fine-tuning heavy modules may compromise inter-video semantic separability, i.e., the essential ability to distinguish objects across videos. While reducing the tunable parameters hinders their intra-video temporal consistency, which is required for stable representations of the same object within a video. This dilemma indicates a potential trade-off between the intra-video temporal consistency and inter-video semantic separability during image-to-video transfer. To this end, we propose the Consistency-Separability Trade-off Transfer Learning (Co-Settle) framework, which applies a lightweight projection layer on top of the frozen image-pretrained encoder to adjust representation space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yafeng19/Co-Settle
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.