ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled   Multimodal Conditions

Shiyue Zhang; Zheng Chong; Xi Lu; Wenqing Zhang; Haoxiang Li; Xujie; Zhang; Jiehui Huang; Xiao Dong; Xiaodan Liang

arXiv:2501.12173·cs.CV·January 22, 2025

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions

Shiyue Zhang, Zheng Chong, Xi Lu, Wenqing Zhang, Haoxiang Li, Xujie, Zhang, Jiehui Huang, Xiao Dong, Xiaodan Liang

PDF

Open Access 1 Repo

TL;DR

ComposeAnyone is a novel method for controllable human image generation that allows decoupled control over layout, text, and reference images, enhancing flexibility and precision in the process.

Contribution

It introduces a decoupled multimodal control framework and a new dataset for layout-to-human image generation, improving flexibility and multi-task capabilities.

Findings

01

Better alignment with layouts, texts, and references

02

Enhanced controllability and flexibility in human image synthesis

03

Demonstrated effectiveness across multiple datasets

Abstract

Building on the success of diffusion models, significant advancements have been made in multimodal image generation tasks. Among these, human image generation has emerged as a promising technique, offering the potential to revolutionize the fashion design process. However, existing methods often focus solely on text-to-image or image reference-based human generation, which fails to satisfy the increasingly sophisticated demands. To address the limitations of flexibility and precision in human generation, we introduce ComposeAnyone, a controllable layout-to-human generation method with decoupled multimodal conditions. Specifically, our method allows decoupled control of any part in hand-drawn human layouts using text or reference images, seamlessly integrating them during the generation process. The hand-drawn layout, which utilizes color-blocked geometric shapes such as ellipses and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangshy1019/composeanyone
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInteractive and Immersive Displays · Tactile and Sensory Interactions · Social Robot Interaction and HRI

MethodsDiffusion · Focus