Visual Alignment Constraint for Continuous Sign Language Recognition
Yuecong Min, Aiming Hao, Xiujuan Chai, Xilin Chen

TL;DR
This paper introduces a Visual Alignment Constraint (VAC) to improve continuous sign language recognition by enhancing feature extractor training, reducing overfitting, and enabling end-to-end training with competitive results.
Contribution
The paper proposes VAC, a novel alignment supervision method with auxiliary losses and metrics to address overfitting in CSLR training.
Findings
VAC improves CSLR performance on challenging datasets.
VAC enables end-to-end trainable CSLR networks.
Proposed metrics effectively measure overfitting in CSLR models.
Abstract
Vision-based Continuous Sign Language Recognition (CSLR) aims to recognize unsegmented signs from image streams. Overfitting is one of the most critical problems in CSLR training, and previous works show that the iterative training scheme can partially solve this problem while also costing more training time. In this study, we revisit the iterative training scheme in recent CSLR works and realize that sufficient training of the feature extractor is critical to solving the overfitting problem. Therefore, we propose a Visual Alignment Constraint (VAC) to enhance the feature extractor with alignment supervision. Specifically, the proposed VAC comprises two auxiliary losses: one focuses on visual features only, and the other enforces prediction alignment between the feature extractor and the alignment module. Moreover, we propose two metrics to reflect overfitting by measuring the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis
MethodsAuxiliary Classifier · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Convolution · Bidirectional LSTM · CNN Bidirectional LSTM
