Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
Jun Zheng, Jing Wang, Fuwei Zhao, Xujie Zhang, Xiaodan Liang

TL;DR
This paper introduces Dynamic Try-On, a video virtual try-on framework using a Diffusion Transformer with dynamic attention to improve clothing detail preservation, reduce computational costs, and ensure temporal consistency during complex movements.
Contribution
It proposes a novel framework that leverages the DiT backbone as a garment encoder with a dynamic feature fusion module and limb-aware dynamic attention for better efficiency and temporal stability.
Findings
Outperforms previous methods in stability and smoothness of video try-on results
Effectively handles complex human movements with high temporal consistency
Reduces computational resources compared to prior approaches
Abstract
Video try-on stands as a promising area for its tremendous real-world potential. Previous research on video try-on has primarily focused on transferring product clothing images to videos with simple human poses, while performing poorly with complex movements. To better preserve clothing details, those approaches are armed with an additional garment encoder, resulting in higher computational resource consumption. The primary challenges in this domain are twofold: (1) leveraging the garment encoder's capabilities in video try-on while lowering computational requirements; (2) ensuring temporal consistency in the synthesis of human body parts, especially during rapid movements. To tackle these issues, we propose a novel video try-on framework based on Diffusion Transformer(DiT), named Dynamic Try-On. To reduce computational overhead, we adopt a straightforward approach by utilizing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Video Coding and Compression Technologies · Visual Attention and Saliency Detection
MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · Diffusion · Focus
