UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation
Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Wayne Wu, Ziwei, Liu

TL;DR
UnitedHuman introduces an end-to-end framework that leverages multi-source, multi-resolution datasets to improve high-resolution human image generation, effectively addressing local detail synthesis issues.
Contribution
The paper proposes a novel Multi-Source Spatial Transformer and a continuous GAN framework to align and utilize diverse datasets for enhanced human image synthesis.
Findings
Achieves higher quality human images than holistic dataset methods
Effectively aligns multi-source images with a human model
Demonstrates superior performance through extensive experiments
Abstract
Human generation has achieved significant progress. Nonetheless, existing methods still struggle to synthesize specific regions such as faces and hands. We argue that the main reason is rooted in the training data. A holistic human dataset inevitably has insufficient and low-resolution information on local parts. Therefore, we propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model. However, multi-source data inherently a) contains different parts that do not spatially align into a coherent human, and b) comes with different scales. To tackle these challenges, we propose an end-to-end framework, UnitedHuman, that empowers continuous GAN with the ability to effectively utilize multi-source data for high-resolution human generation. Specifically, 1) we design a Multi-Source Spatial Transformer that spatially aligns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · CutMix · Label Smoothing · Dropout · ALIGN · Byte Pair Encoding · Absolute Position Encodings · Dense Connections
