TL;DR
OmniTry introduces a unified, mask-free virtual try-on framework capable of handling diverse wearable objects beyond clothing, utilizing a two-stage training process with unpaired and paired images for improved localization and appearance consistency.
Contribution
The paper presents OmniTry, a novel unified VTON framework that extends to various wearable objects without masks, using a two-stage training pipeline with unpaired and paired data.
Findings
Outperforms existing methods in object localization and ID-preservation
Converges quickly with few paired samples in the first training stage
Achieves comprehensive results across 12 wearable object classes
Abstract
Virtual Try-ON (VTON) is a practical and widely-applied task, for which most of existing works focus on clothes. This paper presents OmniTry, a unified framework that extends VTON beyond garment to encompass any wearable objects, e.g., jewelries and accessories, with mask-free setting for more practical application. When extending to various types of objects, data curation is challenging for obtaining paired images, i.e., the object image and the corresponding try-on result. To tackle this problem, we propose a two-staged pipeline: For the first stage, we leverage large-scale unpaired images, i.e., portraits with any wearable items, to train the model for mask-free localization. Specifically, we repurpose the inpainting model to automatically draw objects in suitable positions given an empty mask. For the second stage, the model is further fine-tuned with paired images to transfer the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
