SimDA: Simple Diffusion Adapter for Efficient Video Generation

Zhen Xing; Qi Dai; Han Hu; Zuxuan Wu; Yu-Gang Jiang

arXiv:2308.09710·cs.CV·August 21, 2023·1 cites

SimDA: Simple Diffusion Adapter for Efficient Video Generation

Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang

PDF

Open Access

TL;DR

SimDA introduces a parameter-efficient method to adapt large text-to-image models for video generation by fine-tuning only 24 million parameters and incorporating novel spatial and temporal adapters for improved performance.

Contribution

The paper presents a lightweight adaptation framework that efficiently converts T2I models into T2V models using minimal parameter tuning and new attention mechanisms.

Findings

01

Achieves high-definition video generation with minimal fine-tuning.

02

Utilizes a lightweight spatial and temporal adapter design.

03

Enables quick one-shot video editing with only 2 minutes of tuning.

Abstract

The recent wave of AI-generated content has witnessed the great development and success of Text-to-Image (T2I) technologies. By contrast, Text-to-Video (T2V) still falls short of expectations though attracting increasing interests. Existing works either train from scratch or adapt large T2I model to videos, both of which are computation and resource expensive. In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1.1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way. In particular, we turn the T2I model for T2V by designing light-weight spatial and temporal adapters for transfer learning. Besides, we change the original spatial attention to the proposed Latent-Shift Attention (LSA) for temporal consistency. With similar model architecture, we further train a video super-resolution model to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging

MethodsAdapter · Diffusion