Hierarchical Image Tokenization for Multi-Scale Image Super Resolution
Isma Hadji, Enrique Sanchez, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

TL;DR
The paper presents a novel multi-scale image super-resolution method using hierarchical tokenization and direct preference optimization, achieving state-of-the-art results with a smaller model and no external data.
Contribution
Introduces Hierarchical Image Tokenization and Direct Preference Optimization to enhance VAR-based super-resolution, enabling multi-scale outputs with reduced complexity and improved performance.
Findings
Achieves state-of-the-art super-resolution results.
Produces multi-scale outputs with a single forward pass.
Uses a smaller model (300M params) without external data.
Abstract
We introduce a multi-scale Image Super Resolution (ISR) method building on recent advances in Visual Auto-Regressive (VAR) modeling. VAR models break image tokenization into additive, gradually increasing scales, using Residual Quantization (RQ), an approach that aligns perfectly with our target ISR task. Previous works taking advantage of this synergy suffer from two main shortcomings. First, due to the limitations in RQ, they only generate images at a predefined fixed scale, failing to map intermediate outputs to the corresponding image scales. They also rely on large backbones or a large corpus of annotated data to achieve better performance. To address both shortcomings, we introduce two novel components to the VAR training for ISR, aiming at increasing its flexibility and reducing its complexity. In particular, we introduce a) a \textbf{Hierarchical Image Tokenization (HIT)}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
