STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition

Hongli Liu; Yu Wang; Shengjie Zhao

arXiv:2605.13202·cs.CV·May 14, 2026

STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition

Hongli Liu, Yu Wang, Shengjie Zhao

PDF

1 Repo

TL;DR

STAR introduces a unified semantic-temporal framework for few-shot action recognition, effectively aligning visual and textual cues and modeling multi-scale temporal dynamics to improve recognition accuracy.

Contribution

It proposes novel modules for semantic alignment and temporal modeling, integrating large language models and attention mechanisms to enhance few-shot action recognition.

Findings

01

Achieves up to 8.1% accuracy improvement on SSv2-Full dataset.

02

Demonstrates consistent superiority over state-of-the-art methods across five benchmarks.

03

Validates effectiveness with significant gains under limited supervision.

Abstract

Few-shot action recognition (FSAR) requires models to generalize to novel action categories from only a handful of annotated samples. Despite progress with vision-language models, existing approaches still suffer from semantic-temporal misalignment, where static textual prompts fail to capture decisive visual cues that appear sparsely across sequences, and from inadequate modeling of multi-scale temporal dynamics, as short-term discriminative cues and long-range dependencies are often either oversmoothed or fragmented. To address these challenges, we propose Semantic Temporal Adaptive Representation Learning (STAR), a unified framework, consisting of a semantic-alignment component and a temporal-aware component, effectively bridging the semantic and temporal gaps and transferring the sequence modeling capability of Mamba into the FSAR. The semantic alignment module introduces a Temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HongliLiu1/STAR-main
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.