Creatively Upscaling Images with Global-Regional Priors
Yurui Qian, Qi Cai, Yingwei Pan, Ting Yao, Tao Mei

TL;DR
C-Upscale is a tuning-free image upscaling method that leverages global and regional priors from prompts and multimodal language models to generate ultra-high-resolution images with enhanced fidelity and creativity.
Contribution
The paper introduces C-Upscale, a novel tuning-free approach that uses global-regional priors for high-resolution image generation, addressing limitations in preserving global structure and regional detail.
Findings
Successfully generates ultra-high-resolution images up to 8192x8192.
Achieves higher visual fidelity compared to existing methods.
Enhances regional detail and semantic consistency in generated images.
Abstract
Contemporary diffusion models show remarkable capability in text-to-image generation, while still being limited to restricted resolutions (e.g., 1,024 X 1,024). Recent advances enable tuning-free higher-resolution image generation by recycling pre-trained diffusion models and extending them via regional denoising or dilated sampling/convolutions. However, these models struggle to simultaneously preserve global semantic structure and produce creative regional details in higher-resolution images. To address this, we present C-Upscale, a new recipe of tuning-free image upscaling that pivots on global-regional priors derived from given global prompt and estimated regional prompts via Multimodal LLM. Technically, the low-frequency component of low-resolution image is recognized as global structure prior to encourage global semantic consistency in high-resolution generation. Next, we perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Diffusion
