Hierarchical Sub-action Tree for Continuous Sign Language Recognition
Dejie Yang, Zhu Xu, Xinjie Gao, Yang Liu

TL;DR
This paper introduces the Hierarchical Sub-action Tree (HST) for continuous sign language recognition, effectively integrating gloss knowledge with visual data to improve transcription accuracy using large language models and contrastive learning.
Contribution
It presents a novel HST-based framework that combines gloss knowledge with visual features, reducing complexity and enhancing modality alignment in CSLR.
Findings
Improved accuracy on four benchmark datasets.
Effective utilization of large language models for gloss knowledge.
Enhanced modality alignment through contrastive learning.
Abstract
Continuous sign language recognition (CSLR) aims to transcribe untrimmed videos into glosses, which are typically textual words. Recent studies indicate that the lack of large datasets and precise annotations has become a bottleneck for CSLR due to insufficient training data. To address this, some works have developed cross-modal solutions to align visual and textual modalities. However, they typically extract textual features from glosses without fully utilizing their knowledge. In this paper, we propose the Hierarchical Sub-action Tree (HST), termed HST-CSLR, to efficiently combine gloss knowledge with visual representation learning. By incorporating gloss-specific knowledge from large language models, our approach leverages textual information more effectively. Specifically, we construct an HST for textual information representation, aligning visual and textual modalities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis
MethodsALIGN
