WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled   Diffusion Models

Zijian He; Peixin Chen; Guangrun Wang; Guanbin Li; Philip H.S. Torr,; Liang Lin

arXiv:2407.10625·cs.CV·July 16, 2024

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H.S. Torr,, Liang Lin

PDF

Open Access 1 Repo

TL;DR

WildVidFit introduces a novel image-based controlled diffusion model for video virtual try-on, generating realistic, temporally coherent videos conditioned on garment descriptions and human motion, overcoming limitations of traditional methods.

Contribution

It presents a one-stage diffusion-based approach trained on still images that maintains temporal coherence in video try-on, reducing data and computational requirements.

Findings

01

Effective in generating fluid, coherent videos on multiple datasets.

02

Outperforms traditional warping and blending methods.

03

Leverages pre-trained models for improved temporal consistency.

Abstract

Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and substantial computational resources. To tackle these issues, we reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach. This model, conditioned on specific garments and individuals, is trained on still images rather than videos. It leverages diffusion guidance from pre-trained models including a video masked autoencoder for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MindSpore-scientific-2/code-7/tree/main/WildVidFit
mindspore

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment

MethodsDiffusion