DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On
Xiang Xu

TL;DR
DiffFit introduces a two-stage latent diffusion approach for virtual try-on, combining geometry-aware garment warping with texture refinement to produce highly realistic and well-aligned images of dressed humans.
Contribution
It proposes a novel two-stage framework that separates geometric alignment from appearance refinement, improving realism and accuracy in virtual try-on applications.
Findings
Outperforms state-of-the-art methods in quantitative metrics
Achieves superior visual realism and garment detail preservation
Demonstrates robustness across diverse poses and clothing styles
Abstract
Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion. While recent advances in latent diffusion models have substantially improved visual quality, existing approaches still struggle with preserving fine-grained garment details, achieving precise garment-body alignment, maintaining inference efficiency, and generalizing to diverse poses and clothing styles. To address these challenges, we propose DiffFit, a novel two-stage latent diffusion framework for high-fidelity virtual try-on. DiffFit adopts a progressive generation strategy: the first stage performs geometry-aware garment warping, aligning the garment with the target body through fine-grained deformation and pose adaptation. The second stage refines texture fidelity via a cross-modal conditional diffusion model that integrates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis
