LPH-VTON: Resolving the Structure-Texture Dilemma of Virtual Try-On via Latent Process Handover

Yixin Liu; Baihong Qian; Jinglin Jiang; Jeffery Wu; Yan Chen; Wei Wang; Yida Wang; Lanqing Yang; Guangtao Xue

arXiv:2605.14874·cs.CV·May 15, 2026

LPH-VTON: Resolving the Structure-Texture Dilemma of Virtual Try-On via Latent Process Handover

Yixin Liu, Baihong Qian, Jinglin Jiang, Jeffery Wu, Yan Chen, Wei Wang, Yida Wang, Lanqing Yang, Guangtao Xue

PDF

TL;DR

LPH-VTON introduces a novel framework that balances structural accuracy and textural detail in virtual try-on images by decomposing the generation process into structure and texture phases within a single diffusion model.

Contribution

It formalizes the structure-texture trade-off in diffusion-based VTON and proposes a continuous denoising process that decouples structure and texture generation for improved results.

Findings

01

Achieves a better balance between perceptual faithfulness and structural alignment.

02

Sets new benchmarks on the VITON-HD dataset.

03

Demonstrates the effectiveness of temporal architectural decoupling.

Abstract

Virtual Try-On (VTON) aims to synthesize photorealistic images of garments precisely aligned with a person's body and pose. Current diffusion-based methods, however, face a fundamental trade-off between structural integrity and textural fidelity. In this paper, we formalize this challenge as a consequence of complementary inductive biases inherent in prevailing architectures: models heavily reliant on spatial constraints naturally favor geometric alignment but often suppress textures, whereas models dominated by unconstrained generative priors excel at vibrant detail rendering but are prone to structural drift. Based on this diagnosis, we propose LPH-VTON, a new synergistic framework that resolves this tension within a single, continuous denoising process. LPH-VTON strategically decomposes the generation, leveraging a structure-biased model to establish a geometrically consistent latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.