Make It Move: Controllable Image-to-Video Generation with Text   Descriptions

Yaosi Hu; Chong Luo; Zhenzhong Chen

arXiv:2112.02815·cs.CV·April 1, 2022·1 cites

Make It Move: Controllable Image-to-Video Generation with Text Descriptions

Yaosi Hu, Chong Luo, Zhenzhong Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new task called Text-Image-to-Video generation (TI2V) that creates controllable videos from a static image and text, addressing challenges in aligning appearance and motion while handling uncertainty.

Contribution

It proposes the MAGE model with a novel motion anchor structure and a recursive transformer-based approach for controllable, diverse video generation from images and text.

Findings

01

MAGE effectively aligns appearance and motion for video generation.

02

TI2V demonstrates controllability and diversity in generated videos.

03

Experiments on new datasets show promising results for the approach.

Abstract

Generating controllable videos conforming to user intentions is an appealing yet challenging topic in computer vision. To enable maneuverable control in line with user intentions, a novel video generation task, named Text-Image-to-Video generation (TI2V), is proposed. With both controllable appearance and motion, TI2V aims at generating videos from a static image and a text description. The key challenges of TI2V task lie both in aligning appearance and motion from different modalities, and in handling uncertainty in text descriptions. To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor (MA) structure to store appearance-motion aligned representation. To model the uncertainty and increase the diversity, it further allows the injection of explicit condition and implicit randomness. Through three-dimensional axial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

youncy-hu/mage
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Motion and Animation