Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation

Kazi Mahathir Rahman; Naveed Imtiaz Nafis; Md. Farhan Sadik; Mohammad Al Rafi; Mehedi Hasan Shahed

arXiv:2507.06530·cs.CV·July 10, 2025

Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation

Kazi Mahathir Rahman, Naveed Imtiaz Nafis, Md. Farhan Sadik, Mohammad Al Rafi, Mehedi Hasan Shahed

PDF

Open Access

TL;DR

This paper presents a comprehensive multi-modal pipeline that converts spoken English into realistic 3D sign language animations, integrating speech recognition, translation, and motion synthesis.

Contribution

It introduces a complete system combining speech-to-text, translation to ASL gloss, and 3D animation, with new datasets and a multi-modal approach not previously explored.

Findings

01

Achieved BLEU scores of 0.7714 and 0.8923 in translation quality.

02

Created the Sign3D-WLASL dataset for motion training.

03

Developed the BookGlossCorpus-CG dataset for English-to-ASL translation.

Abstract

Helping deaf and hard-of-hearing people communicate more easily is the main goal of Automatic Sign Language Translation. Although most past research has focused on turning sign language into text, doing the reverse, turning spoken English into sign language animations, has been largely overlooked. That's because it involves multiple steps, such as understanding speech, translating it into sign-friendly grammar, and generating natural human motion. In this work, we introduce a complete pipeline that converts English speech into smooth, realistic 3D sign language animations. Our system starts with Whisper to translate spoken English into text. Then, we use a MarianMT machine translation model to translate that text into American Sign Language (ASL) gloss, a simplified version of sign language that captures meaning without grammar. This model performs well, reaching BLEU scores of 0.7714…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Motion and Animation · Hearing Impairment and Communication

MethodsfastText · Focus