Pose-Guided Sign Language Video GAN with Dynamic Lambda
Christopher Kissel, Christopher K\"ummel, Dennis Ritter, Kristian, Hildebrand

TL;DR
This paper introduces a GAN-based method for synthesizing photorealistic sign language videos guided by pose and region-level layouts, improving realism and signer diversity.
Contribution
It extends previous GAN models with pose guidance and a periodic weighting scheme, enhancing video quality and generalization across signers.
Findings
Achieved SSIM of 0.893 on MS-ASL dataset
Improved video realism with region-guided synthesis
Periodic weighting enhances training stability and results
Abstract
We propose a novel approach for the synthesis of sign language videos using GANs. We extend the previous work of Stoll et al. by using the human semantic parser of the Soft-Gated Warping-GAN from to produce photorealistic videos guided by region-level spatial layouts. Synthesizing target poses improves performance on independent and contrasting signers. Therefore, we have evaluated our system with the highly heterogeneous MS-ASL dataset with over 200 signers resulting in a SSIM of 0.893. Furthermore, we introduce a periodic weighting approach to the generator that reactivates the training and leads to quantitatively better results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications
