AtomoVideo: High Fidelity Image-to-Video Generation

Litong Gong; Yiran Zhu; Weijie Li; Xiaoyang Kang; Biao Wang; Tiezheng; Ge; Bo Zheng

arXiv:2403.01800·cs.CV·March 6, 2024·1 cites

AtomoVideo: High Fidelity Image-to-Video Generation

Litong Gong, Yiran Zhu, Weijie Li, Xiaoyang Kang, Biao Wang, Tiezheng, Ge, Bo Zheng

PDF

Open Access

TL;DR

AtomoVideo is a high fidelity image-to-video generation framework that leverages multi-granularity image injection to produce videos with greater motion, temporal consistency, and stability, while being adaptable to long sequences and personalized models.

Contribution

The paper introduces a novel high fidelity image-to-video generation framework, AtomoVideo, with multi-granularity image injection and adaptable architecture for long sequences and personalization.

Findings

01

Achieves higher fidelity in generated videos compared to existing methods.

02

Maintains superior temporal consistency and stability in video outputs.

03

Enables long sequence prediction through iterative generation.

Abstract

Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training strategies, we achieve greater motion intensity while maintaining superior temporal consistency and stability. Our architecture extends flexibly to the video frame prediction task, enabling long sequence prediction through iterative generation. Furthermore, due to the design of adapter training, our approach can be well combined with existing personalized models and controllable modules. By quantitatively and qualitatively evaluation, AtomoVideo achieves superior results compared to popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging

MethodsAdapter