Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao, Wengang Zhou, Hezhen Hu, Min Wang, Houqiang Li

TL;DR
This paper introduces a self-supervised contrastive learning framework that leverages spatial-temporal consistency, multi-granularity features, and modality interactions to improve sign language recognition accuracy.
Contribution
It proposes a novel contrastive learning approach that exploits spatial-temporal cues and modality interactions for richer sign language representations.
Findings
Achieves state-of-the-art results on four benchmarks.
Effectively encodes fine-grained hand and coarse-trunk features.
Utilizes motion and joint modality interactions for enhanced learning.
Abstract
Recently, there have been efforts to improve the performance in sign language recognition by designing self-supervised learning methods. However, these methods capture limited information from sign pose data in a frame-wise learning manner, leading to sub-optimal solutions. To this end, we propose a simple yet effective self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency from two distinct perspectives and learn instance discriminative representation for sign language recognition. On one hand, since the semantics of sign language are expressed by the cooperation of fine-grained hands and coarse-grained trunks, we utilize both granularity information and encode them into latent spaces. The consistency between hand and trunk features is constrained to encourage learning consistent representation of instance samples. On the other hand,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis
MethodsContrastive Learning
