Image-Based Virtual Try-On: A Survey
Dan Song, Xuanpu Zhang, Juan Zhou, Weizhi Nie, Ruofeng Tong, Mohan, Kankanhalli, An-An Liu

TL;DR
This survey comprehensively reviews current image-based virtual try-on techniques, analyzing methodologies, evaluation metrics, and future directions to bridge the gap between research and commercial applications.
Contribution
It provides a detailed overview of state-of-the-art methods, introduces a unified evaluation framework, and highlights unresolved issues to guide future research in virtual try-on.
Findings
Assessment of semantic alignment using CLIP
Evaluation of methods on a common dataset
Identification of key challenges and future directions
Abstract
Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potential. However, there is a gap between current research progress and commercial applications and an absence of comprehensive overview of this field to accelerate the development.In this survey, we provide a comprehensive analysis of the state-of-the-art techniques and methodologies in aspects of pipeline architecture, person representation and key modules such as try-on indication, clothing warping and try-on stage. We additionally apply CLIP to assess the semantic alignment of try-on results, and evaluate representative methods with uniformly implemented evaluation metrics on the same dataset.In addition to quantitative and qualitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Face recognition and analysis
MethodsContrastive Language-Image Pre-training
