UniFashion: A Unified Vision-Language Model for Multimodal Fashion   Retrieval and Generation

Xiangyu Zhao; Yuehan Zhang; Wenlong Zhang; Xiao-Ming Wu

arXiv:2408.11305·cs.CV·October 15, 2024

UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation

Xiangyu Zhao, Yuehan Zhang, Wenlong Zhang, Xiao-Ming Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

UniFashion is a comprehensive multimodal fashion model that unifies retrieval and generation tasks, leveraging diffusion and language models to improve performance and adaptability in fashion-related vision-language applications.

Contribution

It introduces a novel unified framework combining image generation, retrieval, and text tasks in fashion, surpassing previous single-task models and enabling complex multimodal applications.

Findings

01

Outperforms previous state-of-the-art models across fashion tasks

02

Successfully integrates image generation with retrieval and text tasks

03

Demonstrates potential for complex vision-language applications in fashion

Abstract

The fashion domain encompasses a variety of real-world multimodal tasks, including multimodal retrieval and multimodal generation. The rapid advancements in artificial intelligence generated content, particularly in technologies like large language models for text generation and diffusion models for visual generation, have sparked widespread research interest in applying these multimodal models in the fashion domain. However, tasks involving embeddings, such as image-to-text or text-to-image retrieval, have been largely overlooked from this perspective due to the diverse nature of the multimodal fashion domain. And current research on multi-task single models lack focus on image generation. In this work, we present UniFashion, a unified framework that simultaneously tackles the challenges of multimodal generation and retrieval tasks within the fashion domain, integrating image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiangyu-mm/unifashion
pytorchOfficial

Videos

UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation· underline

Taxonomy

TopicsHuman Motion and Animation

MethodsDiffusion · Focus