MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

Ankan Deria; Dwarikanath Mahapatra; Behzad Bozorgtabar; Mohna Chakraborty; Snehashis Chakraborty; Sudipta Roy

arXiv:2508.08488·cs.CV·August 13, 2025

MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

Ankan Deria, Dwarikanath Mahapatra, Behzad Bozorgtabar, Mohna Chakraborty, Snehashis Chakraborty, Sudipta Roy

PDF

TL;DR

MuGa-VTON is a novel unified diffusion transformer framework for multi-garment virtual try-on that preserves identity and allows prompt-based customization, outperforming existing methods in realism and flexibility.

Contribution

Introduces MuGa-VTON, a multi-garment diffusion model with a shared latent space and prompt customization, advancing virtual try-on technology.

Findings

01

Outperforms existing methods on VITON-HD and DressCode benchmarks.

02

Produces high-fidelity, identity-preserving virtual try-on images.

03

Supports fine-grained garment modifications with minimal user input.

Abstract

Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos, accessories, and body shape-resulting in limited realism and flexibility. To this end, we introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space. Specifically, we proposed three key modules: the Garment Representation Module (GRM) for capturing both garment semantics, the Person Representation Module (PRM) for encoding identity and pose cues, and the A-DiT fusion module, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.