Pose-Guided Sign Language Video GAN with Dynamic Lambda

Christopher Kissel; Christopher K\"ummel; Dennis Ritter; Kristian; Hildebrand

arXiv:2105.02742·cs.CV·May 7, 2021

Pose-Guided Sign Language Video GAN with Dynamic Lambda

Christopher Kissel, Christopher K\"ummel, Dennis Ritter, Kristian, Hildebrand

PDF

Open Access

TL;DR

This paper introduces a GAN-based method for synthesizing photorealistic sign language videos guided by pose and region-level layouts, improving realism and signer diversity.

Contribution

It extends previous GAN models with pose guidance and a periodic weighting scheme, enhancing video quality and generalization across signers.

Findings

01

Achieved SSIM of 0.893 on MS-ASL dataset

02

Improved video realism with region-guided synthesis

03

Periodic weighting enhances training stability and results

Abstract

We propose a novel approach for the synthesis of sign language videos using GANs. We extend the previous work of Stoll et al. by using the human semantic parser of the Soft-Gated Warping-GAN from to produce photorealistic videos guided by region-level spatial layouts. Synthesizing target poses improves performance on independent and contrasting signers. Therefore, we have evaluated our system with the highly heterogeneous MS-ASL dataset with over 200 signers resulting in a SSIM of 0.893. Furthermore, we introduce a periodic weighting approach to the generator that reactivates the training and leads to quantitatively better results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications