Diverse Signer Avatars with Manual and Non-Manual Feature Modelling for Sign Language Production
Mohamed Ilyes Lakhal, Richard Bowden

TL;DR
This paper introduces a novel sign language production method using a Latent Diffusion Model and a feature aggregation module to generate diverse, photorealistic avatars that accurately model manual and non-manual features, enhancing diversity and quality.
Contribution
The paper presents a new approach combining LDM and a feature aggregation module to improve diversity, visual quality, and non-manual feature modeling in sign language avatars.
Findings
Achieves superior visual quality over state-of-the-art methods.
Ensures preservation of linguistic content across diverse avatars.
Significant improvements in perceptual metrics on the YouTube-SL-25 dataset.
Abstract
The diversity of sign representation is essential for Sign Language Production (SLP) as it captures variations in appearance, facial expressions, and hand movements. However, existing SLP models are often unable to capture diversity while preserving visual quality and modelling non-manual attributes such as emotions. To address this problem, we propose a novel approach that leverages Latent Diffusion Model (LDM) to synthesise photorealistic digital avatars from a generated reference image. We propose a novel sign feature aggregation module that explicitly models the non-manual features (\textit{e.g.}, the face) and the manual features (\textit{e.g.}, the hands). We show that our proposed module ensures the preservation of linguistic content while seamlessly using reference images with different ethnic backgrounds to ensure diversity. Experiments on the YouTube-SL-25 sign language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
