Prompt-tuning latent diffusion models for inverse problems

Hyungjin Chung; Jong Chul Ye; Peyman Milanfar; Mauricio Delbracio

arXiv:2310.01110·cs.LG·October 3, 2023·2 cites

Prompt-tuning latent diffusion models for inverse problems

Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio

PDF

Open Access 3 Reviews

TL;DR

This paper introduces P2L, a prompt tuning method for latent diffusion models that enhances inverse problem solving by optimizing text prompts and maintaining latent variables within the encoder's range, leading to improved image reconstruction.

Contribution

The paper presents a novel prompt tuning approach combined with latent space projection to improve inverse problem solutions using latent diffusion models.

Findings

01

P2L outperforms existing inverse problem solvers on multiple tasks.

02

Prompt tuning improves the faithfulness of generated images.

03

Latent space projection reduces artifacts in reconstructed images.

Abstract

We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion process. This allows us to generate images that are more faithful to the diffusion prior. In addition, we propose a method to keep the evolution of latent variables within the range space of the encoder, by projection. This helps to reduce image artifacts, a major problem when using latent diffusion models instead of pixel-based diffusion models. Our combined method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

(1) The idea of learning prompts to guide the diffusion models for inverse problems is very interesting. (2) The method is technically sound.

Weaknesses

The paper lacks a theoretical analysis of, for example, convergence. The results are not very promising. (1) From Table 1, the performance (PSNR) gains of P2L are subtle. (2) From the ablation experiments in Table 4, the difference between the results obtained by not using any of the three proposed modules and the results obtained by using all of them is not significant. (3) From Table 5, the proposed proximal calibration is not that superior to the projection-based calibration, which is even

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. Optimizing the null-embeddings in addition to the latents is a strong contribution and very useful in several downstream applications. 2. The authors achieve state-of-the-art performance in several tasks. 3. The paper is well-written and the main points are clearly discussed with sufficient details to reproduce the results.

Weaknesses

1. In Section 3.2, the authors conduct an experiment using PSLD to show that it always diverges even if it started from a clean image. I believe that this experiment does not offer any insights because of two reasons: (i) The approximation used is not what was proposed in PSLD. In fact, the authors of PSLD show that aiming for any fixed point is not a good idea. Instead, they prove that the gluing objective helps recover the unique fixed point that exhibits contraction towards the optimal soluti

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

1. The motivation of using learnable prompt to improve the performance is meaningful. 2. Experiments on different kinds of image inverse tasks including super-resolution, deblurring, and inpainting are performed.

Weaknesses

1. The writing is poor and the submission is hard to follow. 2. Why we need iterative optimization similar to EM algorithm for optimizing text prompt and latent variable? The submission lacks necessary theoretical analysis and experimental evaluation. 3. The proposed projection is similar to that proposed in Chung et al. (2023b). Why the proposed approach can provide a good regularization is not clearly elaborated and why this approach is named projection is not clear. 4. No further an

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMycobacterium research and diagnosis · Numerical methods in inverse problems · Advanced Neuroimaging Techniques and Applications

MethodsDiffusion