Two-Stream Network for Sign Language Recognition and Translation

Yutong Chen; Ronglai Zuo; Fangyun Wei; Yu Wu; Shujie Liu; Brian Mak

arXiv:2211.01367·cs.CV·March 24, 2023·56 cites

Two-Stream Network for Sign Language Recognition and Translation

Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a dual-stream neural network architecture for sign language recognition and translation that effectively leverages both raw video data and keypoint information, achieving state-of-the-art results.

Contribution

The paper proposes a novel TwoStream network that models raw videos and keypoints separately with interactive modules, improving sign language understanding and translation performance.

Findings

01

Achieves state-of-the-art results on multiple datasets

02

Effectively models both visual and keypoint information

03

Demonstrates the benefit of dual-stream interaction mechanisms

Abstract

Sign languages are visual languages using manual articulations and non-manual elements to convey information. For sign language recognition and translation, the majority of existing approaches directly encode RGB videos into hidden representations. RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding. To mitigate this problem and better incorporate domain knowledge, such as handshape and body movement, we introduce a dual visual encoder containing two separate streams to model both the raw videos and the keypoint sequences generated by an off-the-shelf keypoint estimator. To make the two streams interact with each other, we explore a variety of techniques, including bidirectional lateral connection, sign pyramid network with auxiliary supervision, and frame-level self-distillation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FangyunWei/SLRT
pytorchOfficial

Videos

Two-Stream Network for Sign Language Recognition and Translation· slideslive

Taxonomy

TopicsHand Gesture Recognition Systems · Gait Recognition and Analysis · Human Pose and Action Recognition

MethodsSurrogate Lagrangian Relaxation