DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On

Xiang Xu

arXiv:2506.23295·cs.CV·July 1, 2025

DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On

Xiang Xu

PDF

Open Access

TL;DR

DiffFit introduces a two-stage latent diffusion approach for virtual try-on, combining geometry-aware garment warping with texture refinement to produce highly realistic and well-aligned images of dressed humans.

Contribution

It proposes a novel two-stage framework that separates geometric alignment from appearance refinement, improving realism and accuracy in virtual try-on applications.

Findings

01

Outperforms state-of-the-art methods in quantitative metrics

02

Achieves superior visual realism and garment detail preservation

03

Demonstrates robustness across diverse poses and clothing styles

Abstract

Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion. While recent advances in latent diffusion models have substantially improved visual quality, existing approaches still struggle with preserving fine-grained garment details, achieving precise garment-body alignment, maintaining inference efficiency, and generalizing to diverse poses and clothing styles. To address these challenges, we propose DiffFit, a novel two-stage latent diffusion framework for high-fidelity virtual try-on. DiffFit adopts a progressive generation strategy: the first stage performs geometry-aware garment warping, aligning the garment with the target body through fine-grained deformation and pose adaptation. The second stage refines texture fidelity via a cross-modal conditional diffusion model that integrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis