Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild
Nadav Orzech, Yotam Nitzan, Ulysse Mizrahi, Dov Danon, Amit H. Bermano

TL;DR
This paper introduces a zero-shot, training-free virtual try-on method using extended attention and diffusion models, achieving superior image quality without extensive training or supervision.
Contribution
It proposes a novel zero-shot inpainting approach leveraging diffusion models and extended attention, eliminating the need for training data and improving generalization.
Findings
Outperforms state-of-the-art methods in image quality and garment preservation.
Effectively handles unseen clothing and human figures.
Reduces computational complexity compared to supervised approaches.
Abstract
Virtual Try-On (VTON) is a highly active line of research, with increasing demand. It aims to replace a piece of garment in an image with one from another, while preserving person and garment characteristics as well as image fidelity. Current literature takes a supervised approach for the task, impairing generalization and imposing heavy computation. In this paper, we present a novel zero-shot training-free method for inpainting a clothing garment by reference. Our approach employs the prior of a diffusion model with no additional training, fully leveraging its native generalization capabilities. The method employs extended attention to transfer image information from reference to target images, overcoming two significant challenges. We first initially warp the reference garment over the target human using deep features, alleviating "texture sticking". We then leverage the extended…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsSoftmax · Attention Is All You Need · Inpainting · Diffusion
