OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning

Zhengwei Yang; Andi Long; Hao Li; Zechao Hu; Kui Jiang; Zheng Wang

arXiv:2603.02658·cs.CV·March 4, 2026

OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning

Zhengwei Yang, Andi Long, Hao Li, Zechao Hu, Kui Jiang, Zheng Wang

PDF

Open Access

TL;DR

OmniFashion introduces a unified vision-language framework trained on a large, exhaustively annotated fashion dataset, enabling multi-task reasoning and dialogue to advance generalist fashion intelligence.

Contribution

The paper presents OmniFashion, a novel multi-task vision-language model for fashion that unifies diverse tasks and is trained on the new FashionX dataset with detailed annotations.

Findings

01

Achieves strong accuracy on multiple fashion tasks

02

Demonstrates effective cross-task generalization

03

Enables interactive fashion dialogue

Abstract

Fashion intelligence spans multiple tasks, i.e., retrieval, recommendation, recognition, and dialogue, yet remains hindered by fragmented supervision and incomplete fashion annotations. These limitations jointly restrict the formation of consistent visual-semantic structures, preventing recent vision-language models (VLMs) from serving as a generalist fashion brain that unifies understanding and reasoning across tasks. Therefore, we construct FashionX, a million-scale dataset that exhaustively annotates visible fashion items within an outfit and organizes attributes from global to part-level. Built upon this foundation, we propose OmniFashion, a unified vision-language framework that bridges diverse fashion tasks under a unified fashion dialogue paradigm, enabling both multi-task reasoning and interactive dialogue. Experiments on multi-subtasks and retrieval benchmarks show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis