DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

Jiabao Wei; Zhiyuan Ma

arXiv:2410.12501·cs.CV·October 17, 2024

DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

Jiabao Wei, Zhiyuan Ma

PDF

Open Access

TL;DR

DH-VTON is a novel deep learning model for virtual try-on that uses hybrid attention and deep semantic garment features to improve realism and detail preservation in synthesized images.

Contribution

The paper introduces DH-VTON, combining InternViT-6B and GFC+ modules with hybrid attention to enhance semantic garment understanding and multi-scale feature integration in virtual try-on.

Findings

01

Outperforms previous diffusion and GAN-based methods

02

Preserves garment details effectively

03

Generates more authentic human images

Abstract

Virtual Try-ON (VTON) aims to synthesis specific person images dressed in given garments, which recently receives numerous attention in online shopping scenarios. Currently, the core challenges of the VTON task mainly lie in the fine-grained semantic extraction (i.e.,deep semantics) of the given reference garments during depth estimation and effective texture preservation when the garments are synthesized and warped onto human body. To cope with these issues, we propose DH-VTON, a deep text-driven virtual try-on model featuring a special hybrid attention learning strategy and deep garment semantic preservation module. By standing on the shoulder of a well-built pre-trained paint-by-example (abbr. PBE) approach, we present our DH-VTON pipeline in this work. Specifically, to extract the deep semantics of the garments, we first introduce InternViT-6B as fine-grained feature learner, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies

MethodsSoftmax · Attention Is All You Need · ALIGN · Contrastive Language-Image Pre-training