Fully Convolutional Networks for Continuous Sign Language Recognition
Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai

TL;DR
This paper introduces a fully convolutional network for online continuous sign language recognition, effectively learning spatial and temporal features from weakly annotated videos without pre-training, and demonstrates strong performance on large datasets.
Contribution
The paper proposes a novel fully convolutional network with a gloss feature enhancement module for end-to-end online sign language recognition from weakly annotated data.
Findings
Effective performance on large-scale datasets
Outperforms existing methods in online recognition
End-to-end training without pre-training
Abstract
Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences. Most recent work accomplishes this by using CNN and RNN hybrid networks. However, training these networks is generally non-trivial, and most of them fail in learning unseen sequence patterns, causing an unsatisfactory performance for online recognition. In this paper, we propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences with only sentence-level annotations given. A gloss feature enhancement (GFE) module is introduced in the proposed network to enforce better sequence alignment learning. The proposed network is end-to-end trainable without any pre-training. We conduct experiments on two large scale SLR datasets. Experiments show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
