Image Super-Resolution with Text Prompt Diffusion
Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai, Chen, Xiaokang Yang

TL;DR
This paper introduces a novel image super-resolution method called PromptSR that uses text prompts as priors to improve reconstruction, leveraging multi-modal large language models and text-image datasets for enhanced performance.
Contribution
The paper proposes integrating text prompts into image super-resolution to provide degradation priors, utilizing multi-modal models for improved SR results.
Findings
Significant performance improvements on synthetic and real-world images.
Effective use of text prompts as priors in SR tasks.
Successful integration of multi-modal large language models.
Abstract
Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model. By adopting a discrete design, the text representation is flexible and user-friendly. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR leverages the latest multi-modal large language model…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1, The proposed prompt text includes some degradation, such as Blur, Resize, Noise, Compression. The reviewer wonders to know why contains the scale factor? Is it flexible to embed the scale factor to SR model? 2, This paper introduces a new pipline for how to measure the degradation which serves as prior to effectively guide deep models. 3, The authors varify the effectiveness of the proposed PromptSR compared with existing generative-based models, including FeMaSR, DiffBIR, etc. 4, The anal
1, The proposed prompt text includes some degradation, such as Blur, Resize, Noise, Compression. The reviewer wonders to know why contains the scale factor? Is it flexible to embed the scale factor to SR model? 2, The reviewer would like to know the inference time. 3, Do the authors consider the our-of-distribution case when inference? For example, the testing image contains blur and noise, but the text prompt when inference only has 3 text prompts, e.g. blur, noise, compression. 4, Do the a
The proposed method is well described.
1. The use of textual prompts in image super-resolution tasks is not novel, yet the paper lacks discussion and comparison with existing methods like PASD [1] and SeeSR [2], which also employ textual prompts. 2. In Section 3.1.2, the paper claims that textual prompts depicting degradation are superior to prompts based on image content for conditioning the denoiser network, referencing Figure 3 as evidence. However, it does not clarify how the "overall caption" result was generated, so further ex
- The paper is well-structured. - It explores the effective role of textual information in the super-resolution task.
1. **Limited Novelty**: The idea of degradation-guided RealSR has been extensively explored in numerous low-level vision papers, including but not limited to: - *Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution (DASR)* - *Textual Prompt Guided Image Restoration* - *Dcs-risr: Dynamic Channel Splitting for Efficient Real-World Image Super-Resolution* - *DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution* 2. **
1. The work develops a text-image generation pipeline that integrates prompt into the SR dataset via text representation and degradation model. 2. The work proposes PromptSR, which utilizes the pre-trained language model to improve the restoration. 3. Experiments show the effectiveness of the proposed method.
1. The work lacks comparison with state-of-the-art methods [1,2,3]. 2. The work should conduct experiments on more real-world datasets, e.g. DRealSR dataset. 3. The work does not show how user-friendly and flexible the prompt is. To some extent, it is also flexible to directly give the user a 0-1 value as the strength of each degradation. 4. Using a text encoder to encode discrete degradations is somewhat redundant. Does the method still work when the degradation description is changed (e.g., h
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Image and Signal Denoising Methods
MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Inverse Square Root Schedule · Layer Normalization · Linear Layer · Attention Dropout · Gated Linear Unit · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia?
