ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos

Razieh Rastgoo; Kourosh Kiani; Sergio Escalera

arXiv:2108.10059·cs.CV·August 24, 2021·1 cites

ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos

Razieh Rastgoo, Kourosh Kiani, Sergio Escalera

PDF

Open Access

TL;DR

This paper introduces ZS-SLR, a zero-shot sign language recognition system using RGB-D videos and vision Transformers, achieving state-of-the-art results by mapping visual features to linguistic embeddings.

Contribution

The paper presents a novel two-stream Transformer-based model for zero-shot sign language recognition from RGB-D videos, integrating human detection, segmentation, and semantic mapping.

Findings

01

Achieved state-of-the-art results on four benchmark datasets.

02

Effectively mapped visual features to linguistic embeddings.

03

Demonstrated robustness across multiple sign language datasets.

Abstract

Sign Language Recognition (SLR) is a challenging research area in computer vision. To tackle the annotation bottleneck in SLR, we formulate the problem of Zero-Shot Sign Language Recognition (ZS-SLR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Gait Recognition and Analysis

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Vision Transformer · Label Smoothing · Tanh Activation · Surrogate Lagrangian Relaxation