F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

Devendra K. Jangid; Ripon K. Saha; Dilshan Godaliyadda; Jing Li; Seok-Jun Lee; Hamid R. Sheikh

arXiv:2512.24473·cs.CV·January 1, 2026

F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

Devendra K. Jangid, Ripon K. Saha, Dilshan Godaliyadda, Jing Li, Seok-Jun Lee, Hamid R. Sheikh

PDF

Open Access

TL;DR

This paper introduces F2IDiff, a super-resolution method leveraging lower-level features from DINOv2 to improve image quality without hallucinations, addressing limitations of text-based conditioning in real-world smartphone images.

Contribution

The paper proposes a novel SISR approach using DINOv2 features for conditioning, enhancing detail preservation and reducing hallucinations in high-fidelity smartphone images.

Findings

01

Improved super-resolution quality with fewer hallucinations.

02

Effective use of lower-level features for image conditioning.

03

Better preservation of subtle textures in super-resolved images.

Abstract

With the advent of Generative AI, Single Image Super-Resolution (SISR) quality has seen substantial improvement, as the strong priors learned by Text-2-Image Diffusion (T2IDiff) Foundation Models (FM) can bridge the gap between High-Resolution (HR) and Low-Resolution (LR) images. However, flagship smartphone cameras have been slow to adopt generative models because strong generation can lead to undesirable hallucinations. For substantially degraded LR images, as seen in academia, strong generation is required and hallucinations are more tolerable because of the wide gap between LR and HR images. In contrast, in consumer photography, the LR image has substantially higher fidelity, requiring only minimal hallucination-free generation. We hypothesize that generation in SISR is controlled by the stringency and richness of the FM's conditioning feature. First, text features are high level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis