American Sign Language Video to Text Translation
Parsheeta Roy, Ji-Eun Han, Srishti Chouhan, Bhaavanaa Thumu

TL;DR
This paper evaluates and improves sign language video-to-text translation models, emphasizing the impact of training choices and proposing directions for future enhancements to improve translation accuracy.
Contribution
It replicates a recent study, conducts ablation experiments on model components, and suggests improvements for visual feature extraction and decoder integration.
Findings
Model performance is highly affected by optimizers, activation functions, and label smoothing.
Evaluation with BLEU and rBLEU metrics confirms the importance of training choices.
Source code availability facilitates future research and replication.
Abstract
Sign language to text is a crucial technology that can break down communication barriers for individuals with hearing difficulties. We replicate and try to improve on a recently published study. We evaluate models using BLEU and rBLEU metrics to ensure translation quality. During our ablation study, we found that the model's performance is significantly influenced by optimizers, activation functions, and label smoothing. Further research aims to refine visual feature capturing, enhance decoder utilization, and integrate pre-trained decoders for better translation outcomes. Our source code is available to facilitate replication of our results and encourage future research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Impairment and Communication · Hand Gesture Recognition Systems · Subtitles and Audiovisual Media
