Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition

Congqi Cao; Peiheng Han; Yueran zhang; Yating Yu; Qinyi Lv; Lingtong Min; Yanning zhang

arXiv:2505.06002·cs.CV·July 4, 2025

Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition

Congqi Cao, Peiheng Han, Yueran zhang, Yating Yu, Qinyi Lv, Lingtong Min, Yanning zhang

PDF

Open Access 1 Repo

TL;DR

Task-Adapter++ introduces a dual adaptation framework for few-shot action recognition that enhances cross-modal alignment by incorporating task-specific, order-aware, and fine-grained strategies, achieving state-of-the-art results.

Contribution

It proposes a novel, parameter-efficient dual adaptation method with task-specific and order-aware modules for improved few-shot action recognition.

Findings

01

Achieves state-of-the-art performance on 5 benchmarks.

02

Effectively models semantic order in text descriptions.

03

Enhances cross-modal alignment with fine-grained strategies.

Abstract

Large-scale pre-trained models have achieved remarkable success in language and image tasks, leading an increasing number of studies to explore the application of pre-trained image models, such as CLIP, in the domain of few-shot action recognition (FSAR). However, current methods generally suffer from several problems: 1) Direct fine-tuning often undermines the generalization capability of the pre-trained model; 2) The exploration of task-specific information is insufficient in the visual tasks; 3) The semantic order information is typically overlooked during text modeling; 4) Existing cross-modal alignment techniques ignore the temporal coupling of multimodal information. To address these, we propose Task-Adapter++, a parameter-efficient dual adaptation method for both image and text encoders. Specifically, to make full use of the variations across different few-shot learning tasks, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaulin-bage/task-adapter-pp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training