Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition

Pulkit Kumar; Shuaiyi Huang; Matthew Walmer; Sai Saketh Rambhatla; Abhinav Shrivastava

arXiv:2508.03695·cs.CV·August 6, 2025

Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition

Pulkit Kumar, Shuaiyi Huang, Matthew Walmer, Sai Saketh Rambhatla, Abhinav Shrivastava

PDF

Open Access 1 Datasets

TL;DR

Trokens introduces semantic-aware relational trajectory tokens that adaptively select and model motion patterns for improved few-shot action recognition across multiple benchmarks.

Contribution

The paper proposes a novel semantic-aware sampling and motion modeling framework that enhances trajectory-based features for few-shot action recognition.

Findings

01

Achieves state-of-the-art results on six benchmarks.

02

Effectively models intra- and inter-trajectory motion patterns.

03

Improves recognition accuracy by combining semantic and motion features.

Abstract

Video understanding requires effective modeling of both motion and appearance information, particularly for few-shot action recognition. While recent advances in point tracking have been shown to improve few-shot action recognition, two fundamental challenges persist: selecting informative points to track and effectively modeling their motion patterns. We present Trokens, a novel approach that transforms trajectory points into semantic-aware relational tokens for action recognition. First, we introduce a semantic-aware sampling strategy to adaptively distribute tracking points based on object scale and semantic relevance. Second, we develop a motion modeling framework that captures both intra-trajectory dynamics through the Histogram of Oriented Displacements (HoD) and inter-trajectory relationships to model complex action patterns. Our approach effectively combines these trajectory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

pulkitkumar95/trokens_pt_data
dataset· 409 dl
409 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning