Improving Human Image Synthesis with Residual Fast Fourier Transformation and Wasserstein Distance
Jianhan Wu, Shijing Si, Jianzong Wang, Jing Xiao

TL;DR
This paper introduces novel techniques using Residual Fast Fourier Transform blocks and Wasserstein distance to enhance the realism and training stability of GAN-based human image synthesis, achieving state-of-the-art results.
Contribution
The paper proposes replacing traditional residual blocks with Residual Fast Fourier Transform blocks and applying spectral normalization with Wasserstein distance to improve GAN training stability and image quality.
Findings
Enhanced rendering quality with Residual FFT blocks.
Improved training stability and convergence speed.
Achieved state-of-the-art LPIPS and PSNR scores.
Abstract
With the rapid development of the Metaverse, virtual humans have emerged, and human image synthesis and editing techniques, such as pose transfer, have recently become popular. Most of the existing techniques rely on GANs, which can generate good human images even with large variants and occlusions. But from our best knowledge, the existing state-of-the-art method still has the following problems: the first is that the rendering effect of the synthetic image is not realistic, such as poor rendering of some regions. And the second is that the training of GAN is unstable and slow to converge, such as model collapse. Based on the above two problems, we propose several methods to solve them. To improve the rendering effect, we use the Residual Fast Fourier Transform Block to replace the traditional Residual Block. Then, spectral normalization and Wasserstein distance are used to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Spectral Normalization · Batch Normalization · Residual Block
