D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

Wenjie Pei; Qizhong Tan; Guangming Lu; Jiandong Tian; Jun Yu

arXiv:2312.01431·cs.CV·July 1, 2025·2 cites

D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian, Jun Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces D$^2$ST-Adapter, a lightweight, disentangled spatio-temporal adapter for pre-trained image models, significantly improving few-shot action recognition by effectively capturing spatial and temporal features.

Contribution

The paper proposes a novel D$^2$ST-Adapter with anisotropic deformable attention, enabling efficient disentangled spatio-temporal feature adaptation in pre-trained image models for video tasks.

Findings

01

Outperforms state-of-the-art methods on few-shot action recognition benchmarks.

02

Effective in scenarios with critical temporal dynamics.

03

Compatible with ResNet and ViT architectures.

Abstract

Adapting pre-trained image models to video modality has proven to be an effective strategy for robust few-shot action recognition. In this work, we explore the potential of adapter tuning in image-to-video model adaptation and propose a novel video adapter tuning framework, called Disentangled-and-Deformable Spatio-Temporal Adapter (D $^{2}$ ST-Adapter). It features a lightweight design, low adaptation overhead and powerful spatio-temporal feature adaptation capabilities. D $^{2}$ ST-Adapter is structured with an internal dual-pathway architecture that enables built-in disentangled encoding of spatial and temporal features within the adapter, seamlessly integrating into the single-stream feature learning framework of pre-trained image models. In particular, we develop an efficient yet effective implementation of the D $^{2}$ ST-Adapter, incorporating the specially devised anisotropic Deformable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qizhongtan/d2st-adapter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Global Average Pooling · Kaiming Initialization · Residual Block · Residual Connection · Convolution