Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition

Yu He; Ting Zhu; Yichun Liu; Lichen Ma; Xinyuan Shan; Jingling Fu; Yu Shi; Junshi Huang; Yan Li

arXiv:2605.10127·cs.CV·May 14, 2026

Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition

Yu He, Ting Zhu, Yichun Liu, Lichen Ma, Xinyuan Shan, Jingling Fu, Yu Shi, Junshi Huang, Yan Li

PDF

TL;DR

This paper introduces Fashion130k, a comprehensive e-commerce dataset, and a Unified Multi-modal Condition framework for improved outfit generation by aligning text and visual prompts.

Contribution

The paper presents a new dataset and a novel multi-modal embedding framework that enhances visual consistency in fashion outfit generation.

Findings

01

Fashion130k dataset covers various occasions, models, and garment types.

02

UMC framework effectively aligns multi-modal prompts for consistent outfit generation.

03

Experiments show UMC outperforms state-of-the-art methods in visual consistency.

Abstract

Recent research work on fashion outfit generation focuses on promoting visual consistency of garments by leveraging key information from reference image and text prompt. However, the potential of outfit generation remains underexplored, requiring comprehensive e-commercial dataset and elaborative utilization of multi-modal condition. In this paper, we propose a brand-new e-commerce dataset, named Fashion130k, with various occasions, models, and garment types. For the consistent generation of garment, we design a framework with Unified Multi-modal Condition (UMC) to align and integrate the text and visual prompts into generation model. Specifically, we explore an embedding refiner to extract the unified embeddings of multi-modal prompts, within which a Fusion Transformer is proposed to align the multi-modal embeddings by adjusting the modality gap between text and image. Based on unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.