MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao, Hezhen Hu, Wengang Zhou, Yunyao Mao, Min Wang, Houqiang, Li

TL;DR
MASA introduces a self-supervised learning framework for sign language recognition that explicitly models motion cues and aligns global semantic information, significantly improving representation capabilities and achieving state-of-the-art results.
Contribution
The paper proposes a novel MASA framework combining motion-aware masked autoencoding and semantic alignment for enhanced sign language recognition.
Findings
Achieves state-of-the-art performance on four benchmarks.
Effectively models dynamic motion cues in sign sequences.
Enhances global semantic understanding in sign language recognition.
Abstract
Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Although current pre-training approaches have alleviated this dilemma to some extent and yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: 1) Explicit motion information is usually disregarded in previous pretext tasks, leading to partial information loss and limited representation capability. 2) Previous methods focus on the local context of a sign pose sequence, without incorporating the guidance of the global meaning of lexical signs. To this end, we propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information in a self-supervised learning paradigm for SLR. Our framework contains two crucial components, i.e., a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Gait Recognition and Analysis · Human Pose and Action Recognition
MethodsFocus · Surrogate Lagrangian Relaxation
