ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
Shiyue Zhang, Zheng Chong, Xi Lu, Wenqing Zhang, Haoxiang Li, Xujie, Zhang, Jiehui Huang, Xiao Dong, Xiaodan Liang

TL;DR
ComposeAnyone is a novel method for controllable human image generation that allows decoupled control over layout, text, and reference images, enhancing flexibility and precision in the process.
Contribution
It introduces a decoupled multimodal control framework and a new dataset for layout-to-human image generation, improving flexibility and multi-task capabilities.
Findings
Better alignment with layouts, texts, and references
Enhanced controllability and flexibility in human image synthesis
Demonstrated effectiveness across multiple datasets
Abstract
Building on the success of diffusion models, significant advancements have been made in multimodal image generation tasks. Among these, human image generation has emerged as a promising technique, offering the potential to revolutionize the fashion design process. However, existing methods often focus solely on text-to-image or image reference-based human generation, which fails to satisfy the increasingly sophisticated demands. To address the limitations of flexibility and precision in human generation, we introduce ComposeAnyone, a controllable layout-to-human generation method with decoupled multimodal conditions. Specifically, our method allows decoupled control of any part in hand-drawn human layouts using text or reference images, seamlessly integrating them during the generation process. The hand-drawn layout, which utilizes color-blocked geometric shapes such as ellipses and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Tactile and Sensory Interactions · Social Robot Interaction and HRI
MethodsDiffusion · Focus
