CORAL: Correspondence Alignment for Improved Virtual Try-On

Jiyoung Kim; Youngjin Shin; Siyoon Jin; Dahyun Chung; Jisu Nam; Tongmin Kim; Jongjae Park; Hyeonwoo Kang; Seungryong Kim

arXiv:2602.17636·cs.CV·February 20, 2026

CORAL: Correspondence Alignment for Improved Virtual Try-On

Jiyoung Kim, Youngjin Shin, Siyoon Jin, Dahyun Chung, Jisu Nam, Tongmin Kim, Jongjae Park, Hyeonwoo Kang, Seungryong Kim

PDF

Open Access

TL;DR

This paper introduces CORAL, a framework that explicitly aligns person-garment correspondence in virtual try-on using Diffusion Transformers, improving detail preservation and shape transfer in unpaired settings.

Contribution

The paper proposes CORAL, a novel DiT-based method with correspondence alignment and new evaluation protocol for better virtual try-on performance.

Findings

01

Improved garment detail preservation in VTON.

02

Enhanced global shape transfer accuracy.

03

Validated effectiveness through extensive ablations.

Abstract

Existing methods for Virtual Try-On (VTON) often struggle to preserve fine garment details, especially in unpaired settings where accurate person-garment correspondence is required. These methods do not explicitly enforce person-garment alignment and fail to explain how correspondence emerges within Diffusion Transformers (DiTs). In this paper, we first analyze full 3D attention in DiT-based architecture and reveal that the person-garment correspondence critically depends on precise person-garment query-key matching within the full 3D attention. Building on this insight, we then introduce CORrespondence ALignment (CORAL), a DiT-based framework that explicitly aligns query-key matching with robust external correspondences. CORAL integrates two complementary components: a correspondence distillation loss that aligns reliable matches with person-garment attention, and an entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis