UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
Bikash Kumar Badatya, Vipul Baghel, Ravi Hegde

TL;DR
This paper presents an unsupervised, lightweight skeleton-based GNN approach for fine-grained action localization in sports videos, achieving high accuracy and real-time performance without manual annotations.
Contribution
It introduces a novel unsupervised spatio-temporal GNN framework with a new action dynamics metric for boundary detection, enabling effective localization without labeled data.
Findings
Achieves 82.66% mAP on DSV Diving dataset.
Operates with an average latency of 29.09 ms.
Generalizes well to unseen in-the-wild videos.
Abstract
Fine-grained action localization in untrimmed sports videos presents a significant challenge due to rapid and subtle motion transitions over short durations. Existing supervised and weakly supervised solutions often rely on extensive annotated datasets and high-capacity models, making them computationally intensive and less adaptable to real-world scenarios. In this work, we introduce a lightweight and unsupervised skeleton-based action localization pipeline that leverages spatio-temporal graph neural representations. Our approach pre-trains an Attention-based Spatio-Temporal Graph Convolutional Network (ASTGCN) on a pose-sequence denoising task with blockwise partitions, enabling it to learn intrinsic motion dynamics without any manual labeling. At inference, we define a novel Action Dynamics Metric (ADM), computed directly from low-dimensional ASTGCN embeddings, which detects motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
