TL;DR
This paper introduces a novel neural framework for high-fidelity, temporally consistent human motion transfer from monocular videos, effectively capturing clothing dynamics and pose-dependent details for realistic video synthesis.
Contribution
The authors propose a three-stage image generation approach with recurrent neural networks that improves realism and temporal consistency in human motion transfer, especially for loose garments.
Findings
Outperforms state-of-the-art in video realism
Handles clothing dynamics and pose-dependent details effectively
Provides artistic control over generated results
Abstract
Video-based human motion transfer creates video animations of humans following a source motion. Current methods show remarkable results for tightly-clad subjects. However, the lack of temporally consistent handling of plausible clothing dynamics, including fine and high-frequency details, significantly limits the attainable visual quality. We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments. In contrast to the previous techniques, we perform image generation in three subsequent stages, synthesizing human shape, structure, and appearance. Given a monocular RGB video of an actor, we train a stack of recurrent deep neural networks that generate these intermediate representations from 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttentive Walk-Aggregating Graph Neural Network
