FrameBridge: Improving Image-to-Video Generation with Bridge Models
Yuji Wang, Zehua Chen, Xiaoyu Chen, Yixiang Wei, Jun Zhu, Jianfei Chen

TL;DR
FrameBridge introduces a bridge model approach for image-to-video generation, leveraging data-to-data processes to enhance consistency and quality, and proposes novel fine-tuning and training techniques to outperform existing diffusion-based methods.
Contribution
The paper presents a new bridge model framework for image-to-video generation, including SNR-Aligned Fine-tuning and neural prior techniques, improving synthesis quality and enabling effective training from scratch.
Findings
FrameBridge outperforms diffusion models in quality metrics.
SNR-Aligned Fine-tuning enables leveraging pre-trained diffusion models.
Neural prior improves training from scratch for I2V tasks.
Abstract
Diffusion models have achieved remarkable progress on image-to-video (I2V) generation, while their noise-to-data generation process is inherently mismatched with this task, which may lead to suboptimal synthesis quality. In this work, we present FrameBridge. By modeling the frame-to-frames generation process with a bridge model based data-to-data generative process, we are able to fully exploit the information contained in the given image and improve the consistency between the generation process and I2V task. Moreover, we propose two novel techniques toward the two popular settings of training I2V models, respectively. Firstly, we propose SNR-Aligned Fine-tuning (SAF), making the first attempt to fine-tune a diffusion model to a bridge model and, therefore, allowing us to utilize the pre-trained diffusion-based text-to-video (T2V) models. Secondly, we propose neural prior, further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · Diffusion
