Global-local Enhancement Network for NMFs-aware Sign Language Recognition
Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li

TL;DR
This paper introduces GLE-Net, a dual-stream neural network that effectively captures both global context and fine-grained non-manual features for improved sign language recognition, supported by a new dataset.
Contribution
The paper proposes a novel global-local enhancement network for sign language recognition and introduces the first dataset focusing on non-manual features.
Findings
GLE-Net outperforms existing methods on NMFs-CSL and SLR500 datasets.
The new dataset NMFs-CSL enables better modeling of non-manual features.
Incorporating non-manual features improves recognition accuracy.
Abstract
Sign language recognition (SLR) is a challenging problem, involving complex manual features, i.e., hand gestures, and fine-grained non-manual features (NMFs), i.e., facial expression, mouth shapes, etc. Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net), including two mutually promoted streams towards different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
