Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization

Qiushuo Cheng; Jingjing Liu; Catherine Morgan; Alan Whone; Majid Mirmehdi

arXiv:2512.16504·cs.CV·May 6, 2026

Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization

Qiushuo Cheng, Jingjing Liu, Catherine Morgan, Alan Whone, Majid Mirmehdi

PDF

TL;DR

This paper introduces a contrastive learning method with multiscale feature fusion for skeleton-based action localization, improving boundary detection and achieving state-of-the-art results.

Contribution

It proposes a snippet discrimination pretext task and a U-shaped feature fusion module to enhance skeleton-based action localization.

Findings

01

Improves action localization performance on BABEL dataset.

02

Achieves state-of-the-art transfer learning results on PKUMMD.

03

Enhances feature resolution for frame-level localization.

Abstract

The self-supervised pretraining paradigm has achieved great success in learning 3D action representations for skeleton-based action recognition using contrastive learning. However, learning effective representations for skeleton-based temporal action localization remains challenging and underexplored. Unlike video-level {action} recognition, detecting action boundaries requires temporally sensitive features that capture subtle differences between adjacent frames where labels change. To this end, we formulate a snippet discrimination pretext task for self-supervised pretraining, which densely projects skeleton sequences into non-overlapping segments and promotes features that distinguish them across videos via contrastive learning. Additionally, we build on strong backbones of skeleton-based action recognition models by fusing intermediate features with a U-shaped module to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.