Task-Specific Distance Correlation Matching for Few-Shot Action Recognition
Fei Long, Yao Zhang, Jiaming Lv, Jiangtao Xie, and Peihua Li

TL;DR
This paper introduces TS-FSAR, a novel framework for few-shot action recognition that models complex inter-frame dependencies using task-specific distance correlation and improves CLIP adaptation with a Ladder Side Network and regularization.
Contribution
The paper proposes TS-FSAR, combining a Ladder Side Network, task-specific distance correlation matching, and a guiding module for improved few-shot action recognition.
Findings
Outperforms prior state-of-the-art methods on five benchmarks.
Effectively models both linear and nonlinear dependencies in video frames.
Enhances CLIP adaptation with task-specific regularization.
Abstract
Few-shot action recognition (FSAR) has recently made notable progress through set matching and efficient adaptation of large-scale pre-trained models. However, two key limitations persist. First, existing set matching metrics typically rely on cosine similarity to measure inter-frame linear dependencies and then perform matching with only instance-level information, thus failing to capture more complex patterns such as nonlinear relationships and overlooking task-specific cues. Second, for efficient adaptation of CLIP to FSAR, recent work performing fine-tuning via skip-fusion layers (which we refer to as side layers) has significantly reduced memory cost. However, the newly introduced side layers are often difficult to optimize under limited data conditions. To address these limitations, we propose TS-FSAR, a framework comprising three components: (1) a visual Ladder Side Network (LSN)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Gait Recognition and Analysis
