MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization
Ankan Deria, Dwarikanath Mahapatra, Behzad Bozorgtabar, Mohna Chakraborty, Snehashis Chakraborty, Sudipta Roy

TL;DR
MuGa-VTON is a novel unified diffusion transformer framework for multi-garment virtual try-on that preserves identity and allows prompt-based customization, outperforming existing methods in realism and flexibility.
Contribution
Introduces MuGa-VTON, a multi-garment diffusion model with a shared latent space and prompt customization, advancing virtual try-on technology.
Findings
Outperforms existing methods on VITON-HD and DressCode benchmarks.
Produces high-fidelity, identity-preserving virtual try-on images.
Supports fine-grained garment modifications with minimal user input.
Abstract
Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos, accessories, and body shape-resulting in limited realism and flexibility. To this end, we introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space. Specifically, we proposed three key modules: the Garment Representation Module (GRM) for capturing both garment semantics, the Person Representation Module (PRM) for encoding identity and pose cues, and the A-DiT fusion module, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
