Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction
Dong Li, Wenqi Zhong, Wei Yu, Yingwei Pan, Dingwen Zhang, Ting Yao, Junwei Han, Tao Mei

TL;DR
This paper introduces DPIDM, a diffusion-based framework that models dynamic pose interactions for video virtual try-on, significantly improving temporal consistency and visual authenticity over existing methods.
Contribution
The paper proposes a novel diffusion model with a skeleton-based pose adapter and hierarchical attention to effectively capture spatiotemporal human-garment pose interactions in video try-on.
Findings
Achieves a VFID score of 0.506 on VVT dataset, a 60.5% improvement over previous methods.
Outperforms baseline methods on multiple datasets in visual authenticity and temporal consistency.
Demonstrates the effectiveness of pose-aware spatial and temporal attention mechanisms.
Abstract
Video virtual try-on aims to seamlessly dress a subject in a video with a specific garment. The primary challenge involves preserving the visual authenticity of the garment while dynamically adapting to the pose and physique of the subject. While existing methods have predominantly focused on image-based virtual try-on, extending these techniques directly to videos often results in temporal inconsistencies. Most current video virtual try-on approaches alleviate this challenge by incorporating temporal modules, yet still overlook the critical spatiotemporal pose interactions between human and garment. Effective pose interactions in videos should not only consider spatial alignment between human and garment poses in each frame but also account for the temporal dynamics of human poses throughout the entire video. With such motivation, we propose a new framework, namely Dynamic Pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsSoftmax · Attention Is All You Need · Diffusion · Adapter
