Less is More: Decoder-Free Masked Modeling for Efficient Skeleton Representation Learning

Jeonghyeok Do; Yun Chen; Geunhyuk Youk; Munchurl Kim

arXiv:2603.10648·cs.CV·March 13, 2026

Less is More: Decoder-Free Masked Modeling for Efficient Skeleton Representation Learning

Jeonghyeok Do, Yun Chen, Geunhyuk Youk, Munchurl Kim

PDF

Open Access

TL;DR

SLiM introduces a decoder-free masked modeling framework for skeleton representation learning, combining contrastive learning with masked modeling, leading to state-of-the-art accuracy and significantly improved efficiency.

Contribution

It is the first to propose a decoder-free masked modeling approach for skeleton learning, integrating contrastive learning with masked modeling in a unified framework.

Findings

01

Achieves state-of-the-art performance on downstream tasks.

02

Reduces inference computational cost by 7.89x compared to existing MAE methods.

03

Effectively captures discriminative features without a decoder.

Abstract

The landscape of skeleton-based action representation learning has evolved from Contrastive Learning (CL) to Masked Auto-Encoder (MAE) architectures. However, each paradigm faces inherent limitations: CL often overlooks fine-grained local details, while MAE is burdened by computationally heavy decoders. Moreover, MAE suffers from severe computational asymmetry -- benefiting from efficient masking during pre-training but requiring exhaustive full-sequence processing for downstream tasks. To resolve these bottlenecks, we propose SLiM (Skeleton Less is More), a novel unified framework that harmonizes masked modeling with contrastive learning via a shared encoder. By eschewing the reconstruction decoder, SLiM not only eliminates computational redundancy but also compels the encoder to capture discriminative features directly. SLiM is the first framework with decoder-free masked modeling of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Face recognition and analysis