Regularization by Texts for Latent Diffusion Inverse Solvers
Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, Jong Chul Ye

TL;DR
This paper introduces TReg, a novel regularization method for latent diffusion inverse solvers that incorporates textual descriptions to resolve ambiguities in inverse problems, enhancing accuracy and efficiency.
Contribution
The paper proposes a new text-based regularization technique for diffusion models, inspired by human perceptual biases, to better handle ill-posed inverse problems.
Findings
TReg improves inverse problem accuracy.
TReg enhances efficiency in diffusion-based solvers.
TReg effectively mitigates measurement ambiguities.
Abstract
The recent development of diffusion models has led to significant progress in solving inverse problems by leveraging these models as powerful generative priors. However, challenges persist due to the ill-posed nature of such problems, often arising from ambiguities in measurements or intrinsic system symmetries. To address this, here we introduce a novel latent diffusion inverse solver, regularization by text (TReg), inspired by the human ability to resolve visual ambiguities through perceptual biases. TReg integrates textual descriptions of preconceptions about the solution during reverse diffusion sampling, dynamically reinforcing these descriptions through null-text optimization, which we refer to as adaptive negation. Our comprehensive experimental results demonstrate that TReg effectively mitigates ambiguity in inverse problems, improving both accuracy and efficiency.
Peer Reviews
Decision·ICLR 2025 Spotlight
* this paper introduces an explicit regularization term during training via text-driven regularization (TReg) * the proposed TReg effectively addresses the challenge of ambiguity in inverse problems * this paper further introduces an adaptive negation to dynamically adjust the influence of textual guidance * the paper is well written and easy to follow, extensive in main text and supplementary demonstrate the effectiveness of the proposed method for diffusion inverse solvers
1. lack of overall results over whole dataset. all the experiments results are shown in visualizations or subset results of specific classes. 2. i notice in both table 1 and table 2, PSNR scores are higher without adaptive negation, do authors have any analysis or intuition about this results? 3. (this might not be a weakness, just naturally curious) in the experiments, the text regularization is only tested with class name, how about the results using some natural captions (such as generated wi
Overall the paper is well written, provides clear motivation, and the authors perform reasonably comprehensive experiments. The authors do a good job in their logical flow and provide pretty clear explanations of the proposed method and experimental setup. Figures look nice. In terms of novelty, they authors introduce a reasonably novel approach to solving inverse problems by incorporating text-based regularization into latent diffusion models. This addresses a significant limitation of existi
* `Line 071` In my view the connection to the human brain is quite weak, not sure why the authors put this in the paper... * I would like to see additional, none cherry picked outputs. In my experience using diffusion models to recover images can often fail in very strange ways. I am specifically interested in image inpainting results, which nearly always result in boundary artifacts. * I would like to see experiments in the presence of measurement noise, for example with JPEG compression artifa
The paper tackles an interesting problems with many potential applications. While the individual ingredients are not completely new, even in the context of inverse problems (the use of prompts and the idea of solving the problem using alternate direction method have been explored by Chung et al in “PROMPT-TUNING LATENT DIFFUSION MODELS FOR INVERSE PROBLEMS”; the optimization of the null-text has been explored by Mokady et all in “Null-text Inversion for Editing Real Images using Guided Diffusi
The main weaknesses of this work are in the lack of quantitative evaluations on full datasets, lack of some important baselines and the clarity of the writing could be improved. See below for more detailed comments. Evaluation: all tasks presented are obtained on specific classes rather than being averaged across full datasets as it is done in other the state of the art works (e.g., P2L). Specifically: - Ambiguity reduction: it is unclear if the quantitative evaluation in Figure 3 is an average
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
