TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion   Models

Haomiao Ni; Bernhard Egger; Suhas Lohit; Anoop Cherian; Ye Wang,; Toshiaki Koike-Akino; Sharon X. Huang; Tim K. Marks

arXiv:2404.16306·cs.CV·April 26, 2024

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang,, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks

PDF

Open Access 1 Repo

TL;DR

TI2V-Zero is a zero-shot, tuning-free method that enables a pretrained text-to-video diffusion model to generate videos conditioned on an input image and text, without additional training or external modules.

Contribution

It introduces a novel zero-shot approach using a 'repeat-and-slide' strategy and inversion techniques to condition on images in text-to-video generation without fine-tuning.

Findings

01

Outperforms recent open-domain TI2V models on various datasets.

02

Supports video infilling, prediction, and long video generation.

03

Operates without optimization or external modules.

Abstract

Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

merlresearch/TI2V-Zero
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Radiomics and Machine Learning in Medical Imaging · Mycobacterium research and diagnosis

MethodsDiffusion