GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

Fabio D'Oronzio; Federico Putamorsi; Leonardo Zini; Marcella Cornia; Lorenzo Baraldi

arXiv:2604.25457·cs.CV·April 29, 2026

GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

Fabio D'Oronzio, Federico Putamorsi, Leonardo Zini, Marcella Cornia, Lorenzo Baraldi

PDF

1 Repo

TL;DR

GramSR introduces a diffusion-based super-resolution framework that replaces text conditioning with dense visual features, enabling faithful restoration and superior texture preservation in single-image SR tasks.

Contribution

It proposes a novel one-step diffusion SR method using dense visual features and a three-stage LoRA architecture for improved detail and texture recovery.

Findings

01

Outperforms existing diffusion-based SR methods on standard benchmarks.

02

Achieves better structural fidelity and texture realism.

03

Provides flexible control over different restoration aspects during inference.

Abstract

Despite recent advances, single-image super-resolution (SR) remains challenging, especially in real-world scenarios with complex degradations. Diffusion-based SR methods, particularly those built on Stable Diffusion, leverage strong generative priors but commonly rely on text conditioning derived from semantic captioning. Such textual descriptions provide only high-level semantics and lack the spatially aligned visual information required for faithful restoration, leading to a representation gap between abstract semantics and spatially aligned visual details. To address this limitation, we propose GramSR, a one-step diffusion-based SR framework that replaces text conditioning with dense visual features extracted from the low-resolution input using a pre-trained DINOv3 encoder. GramSR adopts a three-stage LoRA architecture, where pixel-level, semantic-level, and texture-level LoRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/GramSR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.