TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On
Zhenchen Wan, Yanwu Xu, Zhaoqing Wang, Feng Liu, Tongliang Liu,, Mingming Gong

TL;DR
TED-VITON introduces a transformer-based diffusion model for virtual try-on that significantly improves visual quality and text rendering accuracy by integrating specialized adapters, loss functions, and prompt optimization techniques.
Contribution
The paper presents TED-VITON, a novel framework that effectively leverages DiT-based T2I models for VTO, incorporating a Garment Semantic Adapter, Text Preservation Loss, and prompt optimization to enhance performance.
Findings
Achieved state-of-the-art visual quality in VTO images.
Significantly improved text rendering accuracy on garments.
Established new benchmarks for VTO tasks.
Abstract
Recent advancements in Virtual Try-On (VTO) have demonstrated exceptional efficacy in generating realistic images and preserving garment details, largely attributed to the robust generative capabilities of text-to-image (T2I) diffusion backbones. However, the T2I models that underpin these methods have become outdated, thereby limiting the potential for further improvement in VTO. Additionally, current methods face notable challenges in accurately rendering text on garments without distortion and preserving fine-grained details, such as textures and material fidelity. The emergence of Diffusion Transformer (DiT) based T2I models has showcased impressive performance and offers a promising opportunity for advancing VTO. Directly applying existing VTO techniques to transformer-based T2I models is ineffective due to substantial architectural differences, which hinder their ability to fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
MethodsDense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax · Attention Is All You Need
