Sign Language Translation with Iterative Prototype
Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li

TL;DR
This paper introduces IP-SLT, an iterative prototype refinement framework for sign language translation that improves semantic representation and translation accuracy by mimicking human reading behavior.
Contribution
The paper proposes a novel iterative refinement approach for sign language translation, enhancing semantic prototypes through cross-attention and iterative distillation, with minimal inference overhead.
Findings
Effective improvement over baseline SLT systems
Demonstrates superior performance on public benchmarks
Efficient iterative refinement with acceptable computational cost
Abstract
This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). Our IP-SLT adopts a recurrent structure and enhances the semantic representation (prototype) of the input sign language video via an iterative refinement manner. Our idea mimics the behavior of human reading, where a sentence can be digested repeatedly, till reaching accurate understanding. Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement. The initialization module generates the initial prototype based on the visual feature extracted by the feature extraction module. Then, the iterative refinement module leverages the cross-attention mechanism to polish the previous prototype by aggregating it with the original video feature. Through repeated refinement, the prototype finally converges to a more stable and accurate state,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Sign Language Translation with Iterative Prototype· youtube
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications
