CanonSLR: Canonical-View Guided Multi-View Continuous Sign Language Recognition
Xu Wang, Shengeng Tang, Wan Jiang, Yaxiong Wang, Lechao Cheng, Richang Hong

TL;DR
CanonSLR introduces a multi-view CSLR framework using canonical-view guidance, teacher-student learning, and motion modeling to improve robustness across viewpoints in sign language recognition.
Contribution
It proposes a novel multi-view CSLR approach with a teacher-student strategy, semantic discrepancy reduction, motion modeling, and new multi-view benchmarks.
Findings
Outperforms existing methods on multi-view benchmarks.
Shows increased robustness to non-frontal viewpoints.
Provides a new multi-view sign language dataset pipeline.
Abstract
Continuous Sign Language Recognition (CSLR) has achieved remarkable progress in recent years; however, most existing methods are developed under single-view settings and thus remain insufficiently robust to viewpoint variations in real-world scenarios. To address this limitation, we propose CanonSLR, a canonical-view guided framework for multi-view CSLR. Specifically, we introduce a frontal-view-anchored teacher-student learning strategy, in which a teacher network trained on frontal-view data provides canonical temporal supervision for a student network trained on all viewpoints. To further reduce cross-view semantic discrepancy, we propose Sequence-Level Soft-Target Distillation, which transfers structured temporal knowledge from the frontal view to non-frontal samples, thereby alleviating gloss boundary ambiguity and category confusion caused by occlusion and projection variation. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
