From Parts to Whole: A Unified Reference Framework for Controllable   Human Image Generation

Zehuan Huang; Hongxing Fan; Lipeng Wang; Lu Sheng

arXiv:2404.15267·cs.CV·April 24, 2024

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

Zehuan Huang, Hongxing Fan, Lipeng Wang, Lu Sheng

PDF

Open Access 1 Repo 1 Models 2 Datasets

TL;DR

Parts2Whole introduces a unified framework for controllable human image generation from multiple references, leveraging semantic-aware encoding and enhanced attention mechanisms to enable precise, multi-part customization in generated portraits.

Contribution

The paper presents a novel framework that enables multi-part controllable human image generation using semantic-aware encoding and mask-informed attention, advancing beyond existing single-part or zero-shot methods.

Findings

01

Outperforms existing methods in multi-part controllable generation

02

Enables precise part selection through mask-aware attention

03

Supports diverse human appearance customization

Abstract

Recent advancements in controllable human image generation have led to zero-shot generation using structural signals (e.g., pose, depth) or facial appearance. Yet, generating human images conditioned on multiple parts of human appearance remains challenging. Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, including pose images and various aspects of human appearance. To achieve this, we first develop a semantic-aware appearance encoder to retain details of different human parts, which processes each image based on its textual label to a series of multi-scale feature maps rather than one image token, preserving the image dimension. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism that operates across reference and target features during the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huanngzh/Parts2Whole
pytorchOfficial

Models

🤗
huanngzh/Parts2Whole
model· 5 dl· ♡ 8
5 dl♡ 8

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsDiffusion