Point-Supervised Skeleton-Based Human Action Segmentation

Hongsong Wang; Yiqin Shen; Pengbo Yan; Jie Gui

arXiv:2603.06201·cs.CV·March 9, 2026

Point-Supervised Skeleton-Based Human Action Segmentation

Hongsong Wang, Yiqin Shen, Pengbo Yan, Jie Gui

PDF

Open Access

TL;DR

This paper introduces a point-supervised framework for skeleton-based human action segmentation that requires only one labeled frame per action, leveraging multimodal data and pseudo-labeling techniques to achieve competitive results with less annotation effort.

Contribution

The paper proposes a novel point-supervised approach using multimodal skeleton data and a prototype similarity method, reducing annotation costs while maintaining high segmentation performance.

Findings

01

Achieves competitive performance with fewer annotations.

02

Outperforms some fully-supervised methods on benchmark datasets.

03

Establishes new benchmarks for point-supervised segmentation.

Abstract

Skeleton-based temporal action segmentation is a fundamental yet challenging task, playing a crucial role in enabling intelligent systems to perceive and respond to human activities. While fully-supervised methods achieve satisfactory performance, they require costly frame-level annotations and are sensitive to ambiguous action boundaries. To address these issues, we introduce a point-supervised framework for skeleton-based action segmentation, where only a single frame per action segment is labeled. We leverage multimodal skeleton data, including joint, bone, and motion information, encoded via a pretrained unified model to extract rich feature representations. To generate reliable pseudo-labels, we propose a novel prototype similarity method and integrate it with two existing methods: energy function and constrained K-Medoids clustering. Multimodal pseudo-label integration is proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Context-Aware Activity Recognition Systems