Self-Supervised Representation Learning with Spatial-Temporal   Consistency for Sign Language Recognition

Weichao Zhao; Wengang Zhou; Hezhen Hu; Min Wang; Houqiang Li

arXiv:2406.10501·cs.CV·June 18, 2024

Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition

Weichao Zhao, Wengang Zhou, Hezhen Hu, Min Wang, Houqiang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised contrastive learning framework that leverages spatial-temporal consistency, multi-granularity features, and modality interactions to improve sign language recognition accuracy.

Contribution

It proposes a novel contrastive learning approach that exploits spatial-temporal cues and modality interactions for richer sign language representations.

Findings

01

Achieves state-of-the-art results on four benchmarks.

02

Effectively encodes fine-grained hand and coarse-trunk features.

03

Utilizes motion and joint modality interactions for enhanced learning.

Abstract

Recently, there have been efforts to improve the performance in sign language recognition by designing self-supervised learning methods. However, these methods capture limited information from sign pose data in a frame-wise learning manner, leading to sub-optimal solutions. To this end, we propose a simple yet effective self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency from two distinct perspectives and learn instance discriminative representation for sign language recognition. On one hand, since the semantics of sign language are expressed by the cooperation of fine-grained hands and coarse-grained trunks, we utilize both granularity information and encode them into latent spaces. The consistency between hand and trunk features is constrained to encourage learning consistent representation of instance samples. On the other hand,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sakura2233565548/Self-Supervised-Representation-Learning-with-Spatial-Temporal-Consistency-for-SLR
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis

MethodsContrastive Learning