Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Xin Yuan, Jinoo Baek, Keyang Xu, Omer Tov, Hongliang Fei

TL;DR
This paper introduces an efficient diffusion-based method for text-to-video super-resolution that leverages an inflated image diffusion model and a temporal adapter to ensure high-quality, temporally coherent video generation.
Contribution
It presents a novel architecture that inflates a text-to-image super-resolution model for video, incorporating a temporal adapter for coherence, with a comprehensive analysis of trade-offs.
Findings
Achieves high visual quality in text-to-video super-resolution.
Maintains temporal coherence across video frames.
Offers a flexible framework balancing quality and computational cost.
Abstract
We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design an efficient architecture by inflating the weightings of the text-to-image SR model into our video generation framework. Additionally, we incorporate a temporal adapter to ensure temporal coherence across video frames. We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality. Empirical evaluation, both quantitative and qualitative, on the Shutterstock video dataset, demonstrates that our approach is able to perform text-to-video SR generation with good visual quality and temporal consistency. To evaluate temporal coherence, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Advanced Image Processing Techniques
MethodsDiffusion · Adapter
