ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Jinjuan Wang; Wenzhang Sun; Ming Li; Yun Zheng; Fanyao Li; Zhulin Tao; Donglin Di; Hao Li; Wei Chen; Xianglin Huang

arXiv:2506.05858·cs.CV·June 9, 2025

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Jinjuan Wang, Wenzhang Sun, Ming Li, Yun Zheng, Fanyao Li, Zhulin Tao, Donglin Di, Hao Li, Wei Chen, Xianglin Huang

PDF

Open Access

TL;DR

ChronoTailor is a diffusion-based video virtual try-on framework that employs attention guidance to achieve temporally consistent, detailed garment transfer in videos, addressing previous issues of continuity and detail preservation.

Contribution

It introduces a novel spatio-temporal attention mechanism and multi-scale garment feature integration for improved video virtual try-on performance.

Findings

01

Outperforms previous methods in maintaining temporal continuity

02

Preserves fine-grained garment details during motion

03

Successfully handles complex dynamic scenes

Abstract

Video virtual try-on aims to seamlessly replace the clothing of a person in a source video with a target garment. Despite significant progress in this field, existing approaches still struggle to maintain continuity and reproduce garment details. In this paper, we introduce ChronoTailor, a diffusion-based framework that generates temporally consistent videos while preserving fine-grained garment details. By employing a precise spatio-temporal attention mechanism to guide the integration of fine-grained garment features, ChronoTailor achieves robust try-on performance. First, ChronoTailor leverages region-aware spatial guidance to steer the evolution of spatial attention and employs an attention-driven temporal feature fusion mechanism to generate more continuous temporal features. This dual approach not only enables fine-grained local editing but also effectively mitigates artifacts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis