SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
Hung Nguyen, Quang Qui-Vinh Nguyen, Khoi Nguyen, Rang Nguyen

TL;DR
SwiftTry introduces a fast, consistent video virtual try-on method using diffusion models, incorporating temporal attention and a novel caching technique to improve quality and efficiency, validated on a new challenging dataset.
Contribution
The paper presents a novel diffusion-based approach with temporal attention and ShiftCaching for efficient, consistent video virtual try-on, along with a new high-resolution dataset.
Findings
Outperforms existing methods in video consistency
Achieves faster inference speeds
Handles complex backgrounds and movements effectively
Abstract
Given an input video of a person and a new garment, the objective of this paper is to synthesize a new video where the person is wearing the specified garment while maintaining spatiotemporal consistency. Although significant advances have been made in image-based virtual try-on, extending these successes to video often leads to frame-to-frame inconsistencies. Some approaches have attempted to address this by increasing the overlap of frames across multiple video chunks, but this comes at a steep computational cost due to the repeated processing of the same frames, especially for long video sequences. To tackle these challenges, we reconceptualize video virtual try-on as a conditional video inpainting task, with garments serving as input conditions. Specifically, our approach enhances image diffusion models by incorporating temporal attention layers to improve temporal coherence. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Coding and Compression Technologies · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · Diffusion · Inpainting
