Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Bryan Sangwoo Kim; Jeongsol Kim; Jong Chul Ye

arXiv:2505.18600·cs.CV·April 13, 2026

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye

PDF

2 Repos 1 Video

TL;DR

Chain-of-Zoom (CoZ) enables extreme super-resolution by autoregressively decomposing scale magnification into manageable steps, guided by multi-scale-aware prompts and reinforcement learning for high-quality, large-scale image enlargement.

Contribution

The paper introduces CoZ, a scalable, model-agnostic framework that achieves beyond 256x super-resolution without retraining, using scale autoregression and preference-aligned prompts.

Findings

01

CoZ attains beyond 256x enlargement with high perceptual quality.

02

Using a standard 4x diffusion SR model, CoZ effectively decomposes large magnifications.

03

Multi-scale-aware prompts improve visual fidelity at high magnifications.

Abstract

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment· slideslive