ViViD: Video Virtual Try-on using Diffusion Models

Zixun Fang; Wei Zhai; Aimin Su; Hongliang Song; Kai Zhu; Mao Wang; Yu; Chen; Zhiheng Liu; Yang Cao; Zheng-Jun Zha

arXiv:2405.11794·cs.CV·May 29, 2024·2 cites

ViViD: Video Virtual Try-on using Diffusion Models

Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu, Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha

PDF

Open Access 1 Repo 2 Models

TL;DR

ViViD introduces a diffusion model-based framework for video virtual try-on, achieving high-quality, temporally consistent videos by integrating garment and pose encoding with hierarchical temporal modules.

Contribution

The paper presents a novel diffusion model framework for video virtual try-on, including a garment encoder, pose encoder, and temporal modules, along with a new diverse high-resolution dataset.

Findings

01

Achieves high visual quality in video try-on results.

02

Ensures temporal and spatial consistency in generated videos.

03

Outperforms previous methods in visual fidelity and coherence.

Abstract

Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba-yuanjing-aigclab/vivid
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion