VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

Jian Yu; Fei Shen; Cong Wang; Yi Xin; Si Shen; Xiaoyu Du; Jinhui Tang

arXiv:2604.07210·cs.CV·April 9, 2026

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

Jian Yu, Fei Shen, Cong Wang, Yi Xin, Si Shen, Xiaoyu Du, Jinhui Tang

PDF

TL;DR

VersaVogue introduces a unified diffusion-based framework for fashion image synthesis that enables disentangled, multi-condition control and improves realism without relying on human annotations.

Contribution

It proposes a trait-routing attention module for dynamic feature injection and an automated preference optimization pipeline for enhanced controllability and realism.

Findings

01

Outperforms existing methods in visual fidelity and semantic consistency.

02

Supports both garment generation and virtual dressing in a unified framework.

03

Achieves fine-grained attribute control without human-labeled data.

Abstract

Diffusion models have driven remarkable advancements in fashion image generation, yet prior works usually treat garment generation and virtual dressing as separate problems, limiting their flexibility in real-world fashion workflows. Moreover, fashion image synthesis under multi-source heterogeneous conditions remains challenging, as existing methods typically rely on simple feature concatenation or static layer-wise injection, which often causes attribute entanglement and semantic interference. To address these issues, we propose VersaVogue, a unified framework for multi-condition controllable fashion synthesis that jointly supports garment generation and virtual dressing, corresponding to the design and showcase stages of the fashion lifecycle. Specifically, we introduce a trait-routing attention (TA) module that leverages a mixture-of-experts mechanism to dynamically route condition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.