American Sign Language Video to Text Translation

Parsheeta Roy; Ji-Eun Han; Srishti Chouhan; Bhaavanaa Thumu

arXiv:2402.07255·cs.CL·February 13, 2024·1 cites

American Sign Language Video to Text Translation

Parsheeta Roy, Ji-Eun Han, Srishti Chouhan, Bhaavanaa Thumu

PDF

Open Access

TL;DR

This paper evaluates and improves sign language video-to-text translation models, emphasizing the impact of training choices and proposing directions for future enhancements to improve translation accuracy.

Contribution

It replicates a recent study, conducts ablation experiments on model components, and suggests improvements for visual feature extraction and decoder integration.

Findings

01

Model performance is highly affected by optimizers, activation functions, and label smoothing.

02

Evaluation with BLEU and rBLEU metrics confirms the importance of training choices.

03

Source code availability facilitates future research and replication.

Abstract

Sign language to text is a crucial technology that can break down communication barriers for individuals with hearing difficulties. We replicate and try to improve on a recently published study. We evaluate models using BLEU and rBLEU metrics to ensure translation quality. During our ablation study, we found that the model's performance is significantly influenced by optimizers, activation functions, and label smoothing. Further research aims to refine visual feature capturing, enhance decoder utilization, and integrate pre-trained decoders for better translation outcomes. Our source code is available to facilitate replication of our results and encourage future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Impairment and Communication · Hand Gesture Recognition Systems · Subtitles and Audiovisual Media