Diverse Sign Language Translation
Xin Shen, Lei Shen, Shaozu Yuan, Heming Du, Haiyang Sun, Xin Yu

TL;DR
This paper introduces a new task called Diverse Sign Language Translation (DivSLT) that generates multiple accurate and diverse textual translations from sign language videos, addressing the limitations of one-to-one translation models especially with limited data.
Contribution
The paper proposes the DivSLT task, creates a benchmark with multi-reference data, and develops models employing multi-reference training and reinforcement learning to improve diversity and accuracy.
Findings
DivSLT achieves more diverse translations without sacrificing accuracy.
The use of large language models improves reference quality and annotation efficiency.
Reinforcement learning enhances translation performance and diversity.
Abstract
Like spoken languages, a single sign language expression could correspond to multiple valid textual interpretations. Hence, learning a rigid one-to-one mapping for sign language translation (SLT) models might be inadequate, particularly in the case of limited data. In this work, we introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos. Firstly, we employ large language models (LLM) to generate multiple references for the widely-used CSL-Daily and PHOENIX14T SLT datasets. Here, native speakers are only invited to touch up inaccurate references, thus significantly improving the annotation efficiency. Secondly, we provide a benchmark model to spur research in this task. Specifically, we investigate multi-reference training strategies to enable our DivSLT model to achieve diverse translations. Then, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Swearing, Euphemism, Multilingualism
