Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Seungyong Lee; Jeong-gi Kwak

arXiv:2508.04825·cs.GR·November 6, 2025

Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Seungyong Lee, Jeong-gi Kwak

PDF

TL;DR

Voost is a unified diffusion transformer framework that jointly models virtual try-on and try-off tasks, improving realism and consistency in garment synthesis across pose and appearance variations.

Contribution

It introduces a scalable, joint learning approach for try-on and try-off with bidirectional supervision and novel inference techniques, without task-specific networks or extra labels.

Findings

01

Achieves state-of-the-art results on try-on and try-off benchmarks.

02

Outperforms strong baselines in alignment accuracy and visual fidelity.

03

Demonstrates robust generalization across diverse poses and garments.

Abstract

Virtual try-on aims to synthesize a realistic image of a person wearing a target garment, but accurately modeling garment-body correspondence remains a persistent challenge, especially under pose and appearance variation. In this paper, we propose Voost - a unified and scalable framework that jointly learns virtual try-on and try-off with a single diffusion transformer. By modeling both tasks jointly, Voost enables each garment-person pair to supervise both directions and supports flexible conditioning over generation direction and garment category, enhancing garment-body relational reasoning without task-specific networks, auxiliary losses, or additional labels. In addition, we introduce two inference-time techniques: attention temperature scaling for robustness to resolution or mask variation, and self-corrective sampling that leverages bidirectional consistency between tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.