Continuous Sign Language Recognition via Temporal Super-Resolution Network
Qidan Zhu, Jing Li, Fei Yuan, Quan Gan

TL;DR
This paper introduces TSRNet, a temporal super-resolution network that reconstructs dense feature sequences from sparse frames to enable real-time continuous sign language recognition with minimal accuracy loss.
Contribution
It proposes a novel TSRNet architecture with adversarial training and a new evaluation metric, WERD, to improve efficiency and accuracy in sign language recognition.
Findings
Effective reconstruction of dense feature sequences from sparse frames.
Maintains high recognition accuracy with reduced computation.
Demonstrated superior performance on large-scale datasets.
Abstract
Aiming at the problem that the spatial-temporal hierarchical continuous sign language recognition model based on deep learning has a large amount of computation, which limits the real-time application of the model, this paper proposes a temporal super-resolution network(TSRNet). The data is reconstructed into a dense feature sequence to reduce the overall model computation while keeping the final recognition accuracy loss to a minimum. The continuous sign language recognition model(CSLR) via TSRNet mainly consists of three parts: frame-level feature extraction, time series feature extraction and TSRNet, where TSRNet is located between frame-level feature extraction and time-series feature extraction, which mainly includes two branches: detail descriptor and rough descriptor. The sparse frame-level features are fused through the features obtained by the two designed branches as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Gait Recognition and Analysis
