From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
Zehuan Huang, Hongxing Fan, Lipeng Wang, Lu Sheng

TL;DR
Parts2Whole introduces a unified framework for controllable human image generation from multiple references, leveraging semantic-aware encoding and enhanced attention mechanisms to enable precise, multi-part customization in generated portraits.
Contribution
The paper presents a novel framework that enables multi-part controllable human image generation using semantic-aware encoding and mask-informed attention, advancing beyond existing single-part or zero-shot methods.
Findings
Outperforms existing methods in multi-part controllable generation
Enables precise part selection through mask-aware attention
Supports diverse human appearance customization
Abstract
Recent advancements in controllable human image generation have led to zero-shot generation using structural signals (e.g., pose, depth) or facial appearance. Yet, generating human images conditioned on multiple parts of human appearance remains challenging. Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, including pose images and various aspects of human appearance. To achieve this, we first develop a semantic-aware appearance encoder to retain details of different human parts, which processes each image based on its textual label to a series of multi-scale feature maps rather than one image token, preserving the image dimension. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism that operates across reference and target features during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsDiffusion
