TL;DR
This paper introduces a modular neural network that synthesizes realistic images of humans in unseen poses by separating, moving, and refining body parts, enabling both image and video generation across various actions.
Contribution
A novel modular generative neural network that synthesizes human images in unseen poses using a single image and pose data, with joint training and realistic detail generation.
Findings
Produces accurate pose transformations within and across action classes.
Generates coherent action videos from pose sequences.
Outperforms existing methods in realism and pose accuracy.
Abstract
We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
