ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to   Video

Xinhao Li; Yuhan Zhu; Limin Wang

arXiv:2310.01324·cs.CV·July 12, 2024·2 cites

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

Xinhao Li, Yuhan Zhu, Limin Wang

PDF

Open Access 2 Repos 1 Models

TL;DR

ZeroI2V introduces a zero-cost adaptation method that transfers pre-trained image transformers to video recognition tasks by using a novel spatial-temporal attention mechanism and lightweight linear adapters, achieving high efficiency and competitive accuracy.

Contribution

The paper proposes a novel zero-cost adaptation paradigm for image-to-video recognition that requires no additional inference cost, utilizing dual-headed attention and linear adapters for effective transfer.

Findings

01

ZeroI2V matches or outperforms state-of-the-art methods on video recognition benchmarks.

02

It achieves this with no extra parameters or computational overhead during inference.

03

The approach is effective in both fully-supervised and few-shot learning scenarios.

Abstract

Adapting image models to the video domain has emerged as an efficient paradigm for solving video recognition tasks. Due to the huge number of parameters and effective transferability of image models, performing full fine-tuning is less efficient and even unnecessary. Thus, recent research is shifting its focus toward parameter-efficient image-to-video adaptation. However, these adaptation strategies inevitably introduce extra computational costs to deal with the domain gap and temporal modeling in videos. In this paper, we present a new adaptation paradigm (ZeroI2V) to transfer the image transformers to video recognition tasks (i.e., introduce zero extra cost to the original models during inference). To achieve this goal, we present two core designs. First, to capture the dynamics in videos and reduce the difficulty of image-to-video adaptation, we exploit the flexibility of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
MCG-NJU/ZeroI2V
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Human Pose and Action Recognition

MethodsFocus