Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism

Jun Zheng; Jing Wang; Fuwei Zhao; Xujie Zhang; Xiaodan Liang

arXiv:2412.09822·cs.CV·July 29, 2025

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism

Jun Zheng, Jing Wang, Fuwei Zhao, Xujie Zhang, Xiaodan Liang

PDF

Open Access

TL;DR

This paper introduces Dynamic Try-On, a video virtual try-on framework using a Diffusion Transformer with dynamic attention to improve clothing detail preservation, reduce computational costs, and ensure temporal consistency during complex movements.

Contribution

It proposes a novel framework that leverages the DiT backbone as a garment encoder with a dynamic feature fusion module and limb-aware dynamic attention for better efficiency and temporal stability.

Findings

01

Outperforms previous methods in stability and smoothness of video try-on results

02

Effectively handles complex human movements with high temporal consistency

03

Reduces computational resources compared to prior approaches

Abstract

Video try-on stands as a promising area for its tremendous real-world potential. Previous research on video try-on has primarily focused on transferring product clothing images to videos with simple human poses, while performing poorly with complex movements. To better preserve clothing details, those approaches are armed with an additional garment encoder, resulting in higher computational resource consumption. The primary challenges in this domain are twofold: (1) leveraging the garment encoder's capabilities in video try-on while lowering computational requirements; (2) ensuring temporal consistency in the synthesis of human body parts, especially during rapid movements. To tackle these issues, we propose a novel video try-on framework based on Diffusion Transformer(DiT), named Dynamic Try-On. To reduce computational overhead, we adopt a straightforward approach by utilizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Video Coding and Compression Technologies · Visual Attention and Saliency Detection

MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · Diffusion · Focus