I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

Xun Guo; Mingwu Zheng; Liang Hou; Yuan Gao; Yufan Deng; Pengfei Wan,; Di Zhang; Yufan Liu; Weiming Hu; Zhengjun Zha; Haibin Huang; Chongyang Ma

arXiv:2312.16693·cs.CV·June 28, 2024·1 cites

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan,, Di Zhang, Yufan Liu, Weiming Hu, Zhengjun Zha, Haibin Huang, Chongyang Ma

PDF

Open Access 2 Repos 1 Models

TL;DR

I2V-Adapter is a versatile, lightweight module that enhances image-to-video generation by preserving input image identity and maintaining compatibility with existing models, enabling high-quality, controllable videos.

Contribution

It introduces a novel cross-frame attention mechanism and a Frame Similarity Prior, allowing effective image-to-video translation without altering pretrained models.

Findings

01

Produces high-quality, coherent videos with preserved image identity

02

Requires only a few trainable parameters, reducing training costs

03

Ensures compatibility with existing community-driven models

Abstract

Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretrained image encoders in cross-attention modules. However, the former approach often necessitates altering the fundamental weights of pretrained T2V models, thus restricting the model's compatibility within the open-source communities and disrupting the model's prior knowledge. Meanwhile, the latter typically fails to preserve the identity of the input image. We present I2V-Adapter to overcome such limitations. I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Boese0601/X-Dyna
model· 76 dl· ♡ 7
76 dl♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications

MethodsDiffusion · Adapter · Focus