Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
Ben Saunders, Necati Cihan Camgoz, Richard Bowden

TL;DR
This paper introduces a novel sign language production method using a Mixture of Motion Primitives, improving translation accuracy and animation realism by splitting the task into translation and animation sub-tasks with a joint training approach.
Contribution
It proposes a new architecture combining a progressive transformer with a Mixture of Motion Primitives for sign language animation, outperforming existing methods in translation and animation quality.
Findings
Outperforms baselines in user evaluations
Achieves 11% improvement in back translation performance
Demonstrates stronger spoken-to-sign translation than gloss-to-sign
Abstract
It is common practice to represent spoken languages at their phonetic level. However, for sign languages, this implies breaking motion into its constituent motion primitives. Avatar based Sign Language Production (SLP) has traditionally done just this, building up animation from sequences of hand motions, shapes and facial expressions. However, more recent deep learning based solutions to SLP have tackled the problem using a single network that estimates the full skeletal structure. We propose splitting the SLP task into two distinct jointly-trained sub-tasks. The first translation sub-task translates from spoken language to a latent sign language representation, with gloss supervision. Subsequently, the animation sub-task aims to produce expressive sign language sequences that closely resemble the learnt spatio-temporal representation. Using a progressive transformer for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
