TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions
Hui Lu, Albert Ali Salah, Ronald Poppe

TL;DR
TCNet is a hybrid model for continuous sign language recognition that captures long-range spatio-temporal interactions efficiently using trajectories and correlated regions, achieving state-of-the-art results.
Contribution
The paper introduces TCNet, a novel hybrid network with trajectory and correlation modules that reduce computation and improve recognition accuracy in CSL.
Findings
Achieves state-of-the-art performance on four large-scale CSL datasets.
Improves word error rate by 1.5% on PHOENIX14.
Reduces computational cost through dynamic attention mechanisms.
Abstract
A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuous visual tokens. In addition, for a query token, self-attention is learned along the trajectory. As such, our network can also focus on fine-grained spatio-temporal patterns, such as finger movements, of a specific region in motion. TCNet's correlation module uses a novel dynamic attention mechanism that filters out irrelevant frame regions. Additionally, it assigns dynamic key-value tokens from correlated regions to each query. Both innovations significantly reduce the computation cost and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis
MethodsFocus · Circular Smooth Label
