Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization
Umer Ahmed, Syed Ahmed Mahmood, Fawad Javed Fateh, M. Shaheer Luqman, M. Zeeshan Zia, Quoc-Huy Tran

TL;DR
This paper introduces a hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based action segmentation, achieving state-of-the-art results by effectively capturing spatial and temporal cues.
Contribution
The paper presents a novel hierarchical approach that combines spatial and temporal information for improved unsupervised skeleton-based action segmentation.
Findings
Outperforms non-hierarchical baselines on multiple benchmarks.
Reduces segment length bias in action segmentation.
Establishes new state-of-the-art performance on HuGaDB, LARa, and BABEL.
Abstract
We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
