Training-free Clothing Region of Interest Self-correction for Virtual Try-On

Shengjie Lu; Zhibin Wan; Jiejie Liu; Quan Zhang; Mingjie Sun

arXiv:2512.07126·cs.CV·December 9, 2025

Training-free Clothing Region of Interest Self-correction for Virtual Try-On

Shengjie Lu, Zhibin Wan, Jiejie Liu, Quan Zhang, Mingjie Sun

PDF

Open Access

TL;DR

This paper introduces a training-free method for virtual try-on that enhances clothing detail accuracy using an energy-based attention constraint and proposes a new comprehensive evaluation metric, VTID, outperforming previous methods on multiple datasets.

Contribution

It presents a novel training-free approach with an energy function for focused attention in virtual try-on, along with a new metric VTID for better evaluation.

Findings

01

Outperforms SOTA in LPIPS, FID, KID, and VTID metrics.

02

Improves downstream clothing re-identification performance.

03

Demonstrates effectiveness on VITON-HD and DressCode datasets.

Abstract

VTON (Virtual Try-ON) aims at synthesizing the target clothing on a certain person, preserving the details of the target clothing while keeping the rest of the person unchanged. Existing methods suffer from the discrepancies between the generated clothing results and the target ones, in terms of the patterns, textures and boundaries. Therefore, we propose to use an energy function to impose constraints on the attention map extracted through the generation process. Thus, at each generation step, the attention can be more focused on the clothing region of interest, thereby influencing the generation results to be more consistent with the target clothing details. Furthermore, to address the limitation that existing evaluation metrics concentrate solely on image realism and overlook the alignment with target elements, we design a new metric, Virtual Try-on Inception Distance (VTID), to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis