Distinguishing Visually Similar Actions: Prompt-Guided Semantic Prototype Modulation for Few-Shot Action Recognition
Xiaoyang Li, Mingming Lu, Ruiqi Wang, Hao Li, Zewei Le

TL;DR
This paper introduces CLIP-SPM, a framework for few-shot action recognition that improves temporal modeling, bridges modality gaps, and enhances discriminability of similar actions through novel modules and strategies.
Contribution
The paper proposes a novel CLIP-SPM framework with three components to address key challenges in few-shot action recognition, including temporal modeling, visual similarity, and modality gap.
Findings
Achieves competitive results on multiple benchmarks.
Demonstrates effectiveness of each component through ablation studies.
Validates improved discriminability of similar actions.
Abstract
Few-shot action recognition aims to enable models to quickly learn new action categories from limited labeled samples, addressing the challenge of data scarcity in real-world applications. Current research primarily addresses three core challenges: (1) temporal modeling, where models are prone to interference from irrelevant static background information and struggle to capture the essence of dynamic action features; (2) visual similarity, where categories with subtle visual differences are difficult to distinguish; and (3) the modality gap between visual-textual support prototypes and visual-only queries, which complicates alignment within a shared embedding space. To address these challenges, this paper proposes a CLIP-SPM framework, which includes three components: (1) the Hierarchical Synergistic Motion Refinement (HSMR) module, which aligns deep and shallow motion features to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
