Uni-Sign: Toward Unified Sign Language Understanding at Scale
Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang, Li

TL;DR
Uni-Sign introduces a unified pre-training and fine-tuning framework for sign language understanding, leveraging large-scale data and a novel task formulation to improve performance and address existing gaps.
Contribution
It proposes a comprehensive pre-training and fine-tuning approach that unifies SLU tasks as translation, utilizing a large CSL dataset and innovative modules for better knowledge transfer.
Findings
Achieves state-of-the-art results on multiple SLU benchmarks.
Effectively fuses pose and RGB data for improved accuracy.
Addresses pre-training and fine-tuning gap in sign language understanding.
Abstract
Sign language pre-training has gained increasing attention for its ability to enhance performance across various sign language understanding (SLU) tasks. However, existing methods often suffer from a gap between pre-training and fine-tuning, leading to suboptimal results. To address this, we propose Uni-Sign, a unified pre-training framework that eliminates the gap between pre-training and downstream SLU tasks through a large-scale generative pre-training strategy and a novel fine-tuning paradigm. First, we introduce CSL-News, a large-scale Chinese Sign Language (CSL) dataset containing 1,985 hours of video paired with textual annotations, which enables effective large-scale pre-training. Second, Uni-Sign unifies SLU tasks by treating downstream tasks as a single sign language translation (SLT) task during fine-tuning, ensuring seamless knowledge transfer between pre-training and…
Peer Reviews
Decision·ICLR 2025 Poster
1. The Uni-Sign framework proposed by the authors utilizes a large-scale generative pre-training strategy and a novel fine-tuning paradigm to bridge the gap between pre-training and downstream sign language understanding tasks in traditional approaches. 2. The Uni-Sign framework achieves significant performance gains on both sign language recognition and translation tasks, and experiments are conducted on multiple datasets. 3. The related work of paper is adequate, investigating research on si
1. The paper is not clear and detailed enough to explain the score-aware sampling strategy, and does not give a detailed analysis of the process or a corresponding explanation in Figure 5, which could lead to potential misunderstandings or errors. 2. The author omitted experimental results on several widely used datasets, such as Phoenix14, Phoenix14T, USTC-SLR 500, USTC-CSL100, etc. 3. As shown in Tables 4 and 6, the proposed Uni-Sign method does not achieve the best performance on multiple d
1. Uni-Sign effectively unifies multiple SLU tasks, such as isolated sign language recognition (ISLR), continuous sign language recognition (CSLR), and sign language translation (SLT), under a single framework. 2. The introduction of CSL-News, a substantial CSL dataset, provides a significant resource for the SLU field and addresses the limitations of prior smaller datasets.
1. Compared to other datasets, what unique advantages or characteristics does the proposed CSL-News dataset offer besides its longer duration? Additionally, why is the vocabulary size relatively limited, and could the restricted language variety impact pre-training effectiveness? 2. In the comparisons of downstream tasks in Section 4.3, did other methods also use the CSL-News dataset for pre-training? If not, does this raise any concerns about fairness in the comparisons? 3. In the comparative e
Originality: 1. The paper presents Uni-Sign, a novel unified pre-training framework for Sign Language Understanding (SLU) that bridges the gap between pre-training and downstream tasks by treating them as a single Sign Language Translation (SLT) task during fine-tuning. This approach deviates from previous methods that relied on indirect pretext tasks or were limited by data scale and transfer capability 2. The authors introduce CSL-News, a large-scale Chinese Sign Language (CSL) dataset contai
1. Discussion on Computational Complexity: While the authors introduce a score-aware sampling strategy to improve efficiency, a more in-depth discussion on the computational complexity of Uni-Sign would be beneficial. This could include analyzing the trade-offs between accuracy and computational cost for different sampling probabilities and exploring potential optimizations. 2. Further Analysis of CSL-News: While the paper describes the creation of CSL-News, further analysis of the dataset's cha
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Impairment and Communication · Hand Gesture Recognition Systems
MethodsSoftmax · Attention Is All You Need
