TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition

Bin Wang; Wentong Li; Wenqian Wang; Mingliang Gao; Runmin Cong; Wei Zhang

arXiv:2408.10688·cs.CV·June 13, 2025

TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition

Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TDS-CLIP, a memory-efficient side network that enhances temporal modeling and motion feature learning in video action recognition by leveraging adapters, without extensive backpropagation, achieving competitive results.

Contribution

The paper proposes a novel TDS-CLIP framework with specialized adapters to improve temporal and motion feature learning in video recognition, reducing training costs.

Findings

01

Achieves competitive accuracy on benchmark datasets.

02

Effectively captures local temporal differences in motion features.

03

Enhances motion information learning with minimal backpropagation.

Abstract

Recently, large-scale pre-trained vision-language models (e.g., CLIP), have garnered significant attention thanks to their powerful representative capabilities. This inspires researchers in transferring the knowledge from these large pre-trained models to other task-specific models, e.g., Video Action Recognition (VAR) models, via particularly leveraging side networks to enhance the efficiency of parameter-efficient fine-tuning (PEFT). However, current transferring approaches in VAR tend to directly transfer the frozen knowledge from large pre-trained models to action recognition networks with minimal cost, instead of exploiting the temporal modeling capabilities of the action recognition models themselves. Therefore, in this paper, we propose a novel memory-efficient Temporal Difference Side Network (TDS-CLIP) to balance knowledge transferring and temporal modeling, avoiding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BBYL9413/TDS-CLIP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image and Signal Denoising Methods

MethodsSoftmax · Attention Is All You Need · Adapter