Towards Automatic Speech to Sign Language Generation
Parul Kapoor, Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri,, C V Jawahar

TL;DR
This paper introduces the first system to generate continuous sign language videos directly from speech, utilizing a new dataset and a multi-tasking transformer model to improve naturalness and accuracy.
Contribution
It presents a novel end-to-end approach for speech-to-sign language generation without relying on text, supported by a new Indian sign language dataset with speech annotations.
Findings
Effective sign pose sequence generation demonstrated
Multi-tasking transformer outperforms baselines
Ablation studies highlight key module contributions
Abstract
We aim to solve the highly challenging task of generating continuous sign language videos solely from speech segments for the first time. Recent efforts in this space have focused on generating such videos from human-annotated text transcripts without considering other modalities. However, replacing speech with sign language proves to be a practical solution while communicating with people suffering from hearing loss. Therefore, we eliminate the need of using text as input and design techniques that work for more natural, continuous, freely uttered speech covering an extensive vocabulary. Since the current datasets are inadequate for generating sign language directly from speech, we collect and release the first Indian sign language dataset comprising speech-level annotations, text transcripts, and the corresponding sign-language videos. Next, we propose a multi-tasking transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication
