D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian, Jun Yu

TL;DR
This paper introduces D$^2$ST-Adapter, a lightweight, disentangled spatio-temporal adapter for pre-trained image models, significantly improving few-shot action recognition by effectively capturing spatial and temporal features.
Contribution
The paper proposes a novel D$^2$ST-Adapter with anisotropic deformable attention, enabling efficient disentangled spatio-temporal feature adaptation in pre-trained image models for video tasks.
Findings
Outperforms state-of-the-art methods on few-shot action recognition benchmarks.
Effective in scenarios with critical temporal dynamics.
Compatible with ResNet and ViT architectures.
Abstract
Adapting pre-trained image models to video modality has proven to be an effective strategy for robust few-shot action recognition. In this work, we explore the potential of adapter tuning in image-to-video model adaptation and propose a novel video adapter tuning framework, called Disentangled-and-Deformable Spatio-Temporal Adapter (DST-Adapter). It features a lightweight design, low adaptation overhead and powerful spatio-temporal feature adaptation capabilities. DST-Adapter is structured with an internal dual-pathway architecture that enables built-in disentangled encoding of spatial and temporal features within the adapter, seamlessly integrating into the single-stream feature learning framework of pre-trained image models. In particular, we develop an efficient yet effective implementation of the DST-Adapter, incorporating the specially devised anisotropic Deformable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Global Average Pooling · Kaiming Initialization · Residual Block · Residual Connection · Convolution
