Deep Image Spatial Transformation for Person Image Generation
Yurui Ren, Xiaoming Yu, Junming Chen, Thomas H. Li, Ge Li

TL;DR
This paper introduces a differentiable global-flow local-attention framework for pose-guided person image generation, enabling effective spatial transformations at the feature level, outperforming previous methods in quality and applicability.
Contribution
The paper proposes a novel global-flow local-attention mechanism that improves spatial transformation capabilities in person image generation tasks.
Findings
Superior subjective and objective results compared to existing methods
Effective in video animation and view synthesis tasks
Demonstrates the model's versatility in spatial transformation applications
Abstract
Pose-guided person image generation is to transform a source person image to a target pose. This task requires spatial manipulations of source data. However, Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. Specifically, our model first calculates the global correlations between sources and targets to predict flow fields. Then, the flowed local patch pairs are extracted from the feature maps to calculate the local attention coefficients. Finally, we warp the source features using a content-aware sampling method with the obtained local attention coefficients. The results of both subjective and objective experiments demonstrate the superiority of our model. Besides, additional results in video animation and view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Deep Image Spatial Transformation for Person Image Generation· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
