Image Super-Resolution with Text Prompt Diffusion

Zheng Chen; Yulun Zhang; Jinjin Gu; Xin Yuan; Linghe Kong; Guihai; Chen; Xiaokang Yang

arXiv:2311.14282·cs.CV·March 12, 2025·6 cites

Image Super-Resolution with Text Prompt Diffusion

Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai, Chen, Xiaokang Yang

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces a novel image super-resolution method called PromptSR that uses text prompts as priors to improve reconstruction, leveraging multi-modal large language models and text-image datasets for enhanced performance.

Contribution

The paper proposes integrating text prompts into image super-resolution to provide degradation priors, utilizing multi-modal models for improved SR results.

Findings

01

Significant performance improvements on synthetic and real-world images.

02

Effective use of text prompts as priors in SR tasks.

03

Successful integration of multi-modal large language models.

Abstract

Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model. By adopting a discrete design, the text representation is flexible and user-friendly. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR leverages the latest multi-modal large language model…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 5

Strengths

1, The proposed prompt text includes some degradation, such as Blur, Resize, Noise, Compression. The reviewer wonders to know why contains the scale factor? Is it flexible to embed the scale factor to SR model? 2, This paper introduces a new pipline for how to measure the degradation which serves as prior to effectively guide deep models. 3, The authors varify the effectiveness of the proposed PromptSR compared with existing generative-based models, including FeMaSR, DiffBIR, etc. 4, The anal

Weaknesses

1, The proposed prompt text includes some degradation, such as Blur, Resize, Noise, Compression. The reviewer wonders to know why contains the scale factor? Is it flexible to embed the scale factor to SR model? 2, The reviewer would like to know the inference time. 3, Do the authors consider the our-of-distribution case when inference? For example, the testing image contains blur and noise, but the text prompt when inference only has 3 text prompts, e.g. blur, noise, compression. 4, Do the a

Reviewer 02Rating 3Confidence 4

Strengths

The proposed method is well described.

Weaknesses

1. The use of textual prompts in image super-resolution tasks is not novel, yet the paper lacks discussion and comparison with existing methods like PASD [1] and SeeSR [2], which also employ textual prompts. 2. In Section 3.1.2, the paper claims that textual prompts depicting degradation are superior to prompts based on image content for conditioning the denoiser network, referencing Figure 3 as evidence. However, it does not clarify how the "overall caption" result was generated, so further ex

Reviewer 03Rating 3Confidence 5

Strengths

- The paper is well-structured. - It explores the effective role of textual information in the super-resolution task.

Weaknesses

1. **Limited Novelty**: The idea of degradation-guided RealSR has been extensively explored in numerous low-level vision papers, including but not limited to: - *Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution (DASR)* - *Textual Prompt Guided Image Restoration* - *Dcs-risr: Dynamic Channel Splitting for Efficient Real-World Image Super-Resolution* - *DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution* 2. **

Reviewer 04Rating 5Confidence 5

Strengths

1. The work develops a text-image generation pipeline that integrates prompt into the SR dataset via text representation and degradation model. 2. The work proposes PromptSR, which utilizes the pre-trained language model to improve the restoration. 3. Experiments show the effectiveness of the proposed method.

Weaknesses

1. The work lacks comparison with state-of-the-art methods [1,2,3]. 2. The work should conduct experiments on more real-world datasets, e.g. DRealSR dataset. 3. The work does not show how user-friendly and flexible the prompt is. To some extent, it is also flexible to directly give the user a 0-1 value as the strength of each degradation. 4. Using a text encoder to encode discrete degradations is somewhat redundant. Does the method still work when the degradation description is changed (e.g., h

Code & Models

Repositories

zhengchen1999/promptsr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Image and Signal Denoising Methods

MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Inverse Square Root Schedule · Layer Normalization · Linear Layer · Attention Dropout · Gated Linear Unit · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia?