Image Inpainting via Tractable Steering of Diffusion Models

Anji Liu; Mathias Niepert; and Guy Van den Broeck

arXiv:2401.03349·cs.CV·December 12, 2024·1 cites

Image Inpainting via Tractable Steering of Diffusion Models

Anji Liu, Mathias Niepert, and Guy Van den Broeck

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces a novel method for image inpainting that uses tractable probabilistic models to precisely steer diffusion models, improving image quality and semantic coherence with minimal additional computation.

Contribution

It leverages Probabilistic Circuits to guide diffusion models for constrained inpainting, enabling exact posterior computation and enhanced control over generated images.

Findings

01

Improved inpainting quality across multiple datasets.

02

Achieved semantic coherence with only ~10% extra computation.

03

Enabled region-specific semantic constraints in image generation.

Abstract

Diffusion models are the current state of the art for generating photorealistic images. Controlling the sampling process for constrained image generation tasks such as inpainting, however, remains challenging since exact conditioning on such constraints is intractable. While existing methods use various techniques to approximate the constrained posterior, this paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior, and to leverage this signal to steer the denoising process of diffusion models. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs). Building upon prior advances, we further scale up PCs and make them capable of guiding the image generation process of diffusion models. Empirical results suggest that our approach can consistently improve the overall…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The use of TPM for the conditional generation of diffusion models is interesting. 2. The overall quantitative results look good compared to previous methods.

Weaknesses

1. The comparison only contains six different masks. In real application, the cases where the images are masked by some texts or patterns are also very common. It would be ideal to see more comparisons of such masks in arbitrary shapes. 2. The table only contains LPIPS for quantitative measurement, however, as image inpainting is an ill-posed problem, a user study would be beneficial in this case as previous works such as [1][2] perform. [1] Towards Coherent Image Inpainting Using Denoising Dif

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The paper appears to present a novel approach to the problem of image inpainting using diffusion models. The integration of Tractable Probabilistic Models (TPMs), specifically Probabilistic Circuits (PCs), to guide the denoising process of diffusion models is an inventive combination of existing ideas. This creative synergy seems to address the intractability issue inherent in exact conditioning required for tasks like inpainting. Additionally, the paper builds upon prior advances to scale up P

Weaknesses

1. The TPMs seem to be a general design, while this work constrain the application to image inpainting only, I am not sure about the intuition of this specific application. How about the potential of this method for general conditional generation? 2. In section 6.2, it says "we only need to incorporate guidance from the TPM in the early denoising stages to control the global semantics of the image; fine-grained details can be later refined by the diffusion model. As a result, TPM is only requi

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

- Inpainting using pre-trained diffusion is an interesting direction to explore and modeling p_\theta{x_0} provides an alternative angle to look into this problem. - The presentation is clear and results show the potential of the proposed method. On the three datasets, CelebA-HQ, ImageNet, LSUN-Bedroom, with masks of different shapes or at different positions, the proposed approach achieved higher LPIPS scores mostly. - A practical approximation to apply the proposed methods for high-resolutio

Weaknesses

- The computing cost is related to the size of the hole as well as the network architecture. As authors have mentioned, it also related to resolution. So instead of claiming 10% additional computation overhead, a detailed analysis is more helpful. - Number-wise the improvement on LPIPS value compared to CoPaint, or even RePaint is minor. How about other metrics? Or user studies? - Section 4.2 reads slightly disconnected from Section 4.1, especially the introduction of the equation 7.

Code & Models

Repositories

ucla-starai/tiramisu
pytorchOfficial

Videos

Image Inpainting via Tractable Steering of Diffusion Models· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Computer Graphics and Visualization Techniques

MethodsDiffusion