Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
Yu He, Ting Zhu, Yichun Liu, Lichen Ma, Xinyuan Shan, Jingling Fu, Yu Shi, Junshi Huang, Yan Li

TL;DR
This paper introduces Fashion130k, a comprehensive e-commerce dataset, and a Unified Multi-modal Condition framework for improved outfit generation by aligning text and visual prompts.
Contribution
The paper presents a new dataset and a novel multi-modal embedding framework that enhances visual consistency in fashion outfit generation.
Findings
Fashion130k dataset covers various occasions, models, and garment types.
UMC framework effectively aligns multi-modal prompts for consistent outfit generation.
Experiments show UMC outperforms state-of-the-art methods in visual consistency.
Abstract
Recent research work on fashion outfit generation focuses on promoting visual consistency of garments by leveraging key information from reference image and text prompt. However, the potential of outfit generation remains underexplored, requiring comprehensive e-commercial dataset and elaborative utilization of multi-modal condition. In this paper, we propose a brand-new e-commerce dataset, named Fashion130k, with various occasions, models, and garment types. For the consistent generation of garment, we design a framework with Unified Multi-modal Condition (UMC) to align and integrate the text and visual prompts into generation model. Specifically, we explore an embedding refiner to extract the unified embeddings of multi-modal prompts, within which a Fusion Transformer is proposed to align the multi-modal embeddings by adjusting the modality gap between text and image. Based on unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
