Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

Umer Ahmed; Syed Ahmed Mahmood; Fawad Javed Fateh; M. Shaheer Luqman; M. Zeeshan Zia; Quoc-Huy Tran

arXiv:2604.15196·cs.CV·April 17, 2026

Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

Umer Ahmed, Syed Ahmed Mahmood, Fawad Javed Fateh, M. Shaheer Luqman, M. Zeeshan Zia, Quoc-Huy Tran

PDF

TL;DR

This paper introduces a hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based action segmentation, achieving state-of-the-art results by effectively capturing spatial and temporal cues.

Contribution

The paper presents a novel hierarchical approach that combines spatial and temporal information for improved unsupervised skeleton-based action segmentation.

Findings

01

Outperforms non-hierarchical baselines on multiple benchmarks.

02

Reduces segment length bias in action segmentation.

03

Establishes new state-of-the-art performance on HuGaDB, LARa, and BABEL.

Abstract

We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.