TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On

Zhenchen Wan; Yanwu Xu; Zhaoqing Wang; Feng Liu; Tongliang Liu,; Mingming Gong

arXiv:2411.17017·cs.CV·March 12, 2025

TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On

Zhenchen Wan, Yanwu Xu, Zhaoqing Wang, Feng Liu, Tongliang Liu,, Mingming Gong

PDF

Open Access 1 Repo

TL;DR

TED-VITON introduces a transformer-based diffusion model for virtual try-on that significantly improves visual quality and text rendering accuracy by integrating specialized adapters, loss functions, and prompt optimization techniques.

Contribution

The paper presents TED-VITON, a novel framework that effectively leverages DiT-based T2I models for VTO, incorporating a Garment Semantic Adapter, Text Preservation Loss, and prompt optimization to enhance performance.

Findings

01

Achieved state-of-the-art visual quality in VTO images.

02

Significantly improved text rendering accuracy on garments.

03

Established new benchmarks for VTO tasks.

Abstract

Recent advancements in Virtual Try-On (VTO) have demonstrated exceptional efficacy in generating realistic images and preserving garment details, largely attributed to the robust generative capabilities of text-to-image (T2I) diffusion backbones. However, the T2I models that underpin these methods have become outdated, thereby limiting the potential for further improvement in VTO. Additionally, current methods face notable challenges in accurately rendering text on garments without distortion and preserving fine-grained details, such as textures and material fidelity. The emergence of Diffusion Transformer (DiT) based T2I models has showcased impressive performance and offers a promising opportunity for advancing VTO. Directly applying existing VTO techniques to transformer-based T2I models is ineffective due to substantial architectural differences, which hinder their ability to fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenchenwan/ted-viton
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsDense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax · Attention Is All You Need