Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models

Svetlana Orlova; Niccol\`o Cavagnero; Gijs Dubbelman

arXiv:2605.19137·cs.CV·May 20, 2026

Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models

Svetlana Orlova, Niccol\`o Cavagnero, Gijs Dubbelman

PDF

1 Repo 1 Models

TL;DR

This paper investigates a lightweight approach to video pre-training by freezing an image foundation model and training only a temporal module, aiming to reduce data and compute costs while maintaining strong performance.

Contribution

It introduces a novel paradigm of reusing pre-trained image models with a frozen spatial encoder and training a recurrent temporal module for video understanding.

Findings

01

Strong temporal performance achieved without large-scale video pre-training

02

Reusing image foundation models reduces data and compute requirements

03

Empirical results across multiple tasks support the approach's feasibility

Abstract

Video foundation models achieve strong performance across many video understanding tasks, but typically require large-scale pre-training on massive video datasets, resulting in substantial data and compute costs. In contrast, modern image foundation models already provide powerful spatial representations. This raises an important question: can competitive video models be built by reusing these spatial representations and pre-training only for temporal reasoning? We take initial steps toward exploring a lightweight training paradigm that freezes a pre-trained image foundation model and trains only a recurrent temporal module to process streaming video. By reusing an image foundation model as a spatial encoder, this approach could significantly reduce the amount of video data and compute required compared to end-to-end video pre-training. In this work, we explore the feasibility of this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tue-mps/towards-video-image-frozen
github

Models

🤗
tue-mps/towards-video-image-frozen
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.