CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
Haonan Qiu, Ning Yu, Ziqi Huang, Paul Debevec, Ziwei Liu

TL;DR
CineScale introduces a new inference method that allows existing pre-trained visual diffusion models to generate high-resolution images and videos, including 8K images and 4K videos, without extensive fine-tuning.
Contribution
The paper presents CineScale, a novel inference paradigm that extends high-resolution visual generation capabilities of pre-trained models across various architectures without requiring retraining.
Findings
Enables 8K image generation without fine-tuning.
Achieves 4K video generation with minimal LoRA fine-tuning.
Outperforms existing methods in high-resolution visual synthesis.
Abstract
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability to generate high-fidelity images or videos at higher resolutions. Recent efforts have explored tuning-free strategies to exhibit the untapped potential higher-resolution visual generation of pre-trained models. However, these methods are still prone to producing low-quality visual content with repetitive patterns. The key obstacle lies in the inevitable increase in high-frequency information when the model generates visual content exceeding its training resolution, leading to undesirable repetitive patterns deriving from the accumulated errors. In this work, we propose CineScale, a novel inference paradigm to enable higher-resolution visual generation. To tackle the various issues…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCinema and Media Studies
