FashionComposer: Compositional Fashion Image Generation
Sihui Ji, Yiyang Wang, Xi Chen, Xiaogang Xu, Hao Luo, Hengshuang Zhao

TL;DR
FashionComposer introduces a flexible, multi-modal framework for compositional fashion image generation, enabling personalized, multi-garment, and pose customization in a single pass, with robust handling of diverse inputs.
Contribution
It develops a universal, scalable framework with a reference UNet and subject-binding attention for seamless multi-modal and multi-reference fashion image synthesis.
Findings
Supports arbitrary numbers and types of reference images.
Enables personalized human appearance and pose customization.
Facilitates diverse applications like virtual try-on and human album generation.
Abstract
We present FashionComposer for compositional fashion image generation. Unlike previous methods, FashionComposer is highly flexible. It takes multi-modal input (i.e., text prompt, parametric human model, garment image, and face image) and supports personalizing the appearance, pose, and figure of the human and assigning multiple garments in one pass. To achieve this, we first develop a universal framework capable of handling diverse input modalities. We construct scaled training data to enhance the model's robust compositional capabilities. To accommodate multiple reference images (garments and faces) seamlessly, we organize these references in a single image as an "asset library" and employ a reference UNet to extract appearance features. To inject the appearance features into the correct pixels in the generated result, we propose subject-binding attention. It binds the appearance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFashion and Cultural Textiles
